Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

Volker Simonis Tue, 22 Mar 2022 09:29:03 -0700

On Mon, Mar 21, 2022 at 8:24 PM Lance Andersen
<lance.ander...@oracle.com> wrote:


Hi Lance,

Thanks for looking into this issue. Please find my answers inline:

> Hi Volker,
>
> I have read through what you have provided/pointed to, thank you, and on the 
> surface what you are suggesting sounds reasonable.
>
> That being said given that this API dates back to 1997ish, I think we have to 
> be careful not introduce any regressions with existing applications with the 
> proposal you suggest(even though it is just relaxes the spec), as you 
> mentioned their is a jtreg test that  seems to fail.
>
> Have you run the JCK with your ZLIB implementation?  I only skimmed the tests 
> but looks like there might be a couple of tests which validate the array’s 
> contents.

Yes, I did run the JCK tests with the optimized zlib implementations
and couldn't find any problems related to this issue. We've also using
the optimized version of zlib in production for quite a while without
any problems. However, I did found a problem related to a test which
was copied from "test/jdk/java/util/zip/DeflateIn_InflateOut.java".
That jtreg test was fixed with "8240226: DeflateIn_InflateOut.java
test incorrectly assumes size of compressed" [2] in JDK 15 but
apparently that fix didn't made it into the JCK version. The test has
now been excluded from JCK 17/18.

Finally, the currently failing "nio/zipfs/ZipFSOutputStreamTest.java"
[3] jtreg test is not failing because it specifically checks that the
bytes in the output buffer beyond the last inflated byte remains
untouched. It's just because they use a poor, "lazy" method of
comparing inflated content to the original content and can easily be
fixed. Instead of:
```
while ((numRead = is.read(buf)) != -1) {
    totalRead += numRead;
    // verify the content
    Assert.assertEquals(Arrays.mismatch(buf, chunk), -1, "Unexpected content");
}
```
just use:
```
while ((numRead = is.read(buf)) != -1) {
    totalRead += numRead;
    // verify the content
    Assert.assertEquals(Arrays.mismatch(buf, 0, numRead, chunk, 0,
numRead), -1, "Unexpected content");
}
```

[1] 
https://github.com/openjdk/jdk/blob/master/test/jdk/java/util/zip/DeflateIn_InflateOut.java
[2] https://bugs.openjdk.java.net/browse/JDK-8240226
[3] 
https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java

>
> We don’t have a lot of test coverage so if we were to consider moving forward 
> with your proposal, I believe additional test coverage would be warranted.

Sure, I'll be happy to add some more testing. Do you have specific
ideas? In fact my suggestion relaxes the specification of read(..)
which would be hard to check :)

Thank you and best regards,
Volker

> Best
> Lance
>
>
> On Mar 4, 2022, at 5:04 AM, Volker Simonis <volker.simo...@gmail.com> wrote:
>
> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater
> functionality. `Inflater::inflate(byte[] output, int off, int len)`
> currently calls zlib's native `inflate(..)` function and passes the
> address of `output[off]` and `len` to it via JNI.
>
> The specification of zlib's `inflate(..)` function (i.e. the [API
> documentation in the original zlib
> implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400))
> doesn't give any guarantees with regard to usage of the output buffer.
> It only states that upon completion the function will return the
> number of bytes that have been written (i.e. "inflated") into the
> output buffer.
>
> The original zlib implementation only wrote as many bytes into the
> output buffer as it inflated. However, this is not a hard requirement
> and newer, more performant implementations of the zlib library like
> [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/)
> or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more
> bytes of the output buffer than they actually inflate as a scratch
> buffer. See https://github.com/simonis/zlib-chromium for a more
> detailed description of their approach and its performance benefit.
>
> These new zlib versions can still be used transparently from Java
> (e.g. by putting them into the `LD_LIBRARY_PATH` or by using
> `LD_PRELOAD`), because they still fully comply to specification of
> `Inflater::inflate(..)`. However, we might run into problems when
> using the `Inflater` functionality from the `InflaterInputStream`
> class. `InflaterInputStream` is derived from from `InputStream` and as
> such, its `read(byte[] b, int off, int len)` method is quite
> constrained. It specifically specifies that if *k* bytes have been
> read, then "these bytes will be stored in elements `b[off]` through
> `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through
> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[]
> b, int off, int len)` (which is constrained by
> `InputStream::read(..)`'s specification) calls
> `Inflater::inflate(byte[] b, int off, int len)` and directly passes
> its output buffer down to the native zlib `inflate(..)` method which
> is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the
> number of inflated bytes).
>
> From a practical point of view, I don't see this as a big problem,
> because callers of `InflaterInputStream::read(byte[] b, int off, int
> len)` can never know how many bytes will be written into the output
> buffer `b` (and in fact its content can always be completely
> overwritten). It therefore makes no sense to depend on any data there
> being untouched after the call. Also, having used zlib-cloudflare
> productively for about two years, we haven't seen real-world issues
> because of this behavior yet. However, from a specification point of
> view it is easy to artificially construct a program which violates
> `InflaterInputStream::read(..)`'s postcondition if using one of the
> alterantive zlib implementations. A recently integrated JTreg test
> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally"
> fails with zlib-chromium but could be easily fixed to run with
> alternative implementations as well.
>
> What should/can be done to solve this problem? I think we have three options:
>
> 1. Relax the specification of `InflaterInputStream::read(..)` and
> specifically note in the API documentation that a call to
> `InflaterInputStream::read(byte[] b, int off, int len)` may write more
> than *k* characters into `b` where *k* is the returned number of
> inflated bytes. Notice that there's a precedence for this approach in
> `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden
> method of InputStream, returns -1 instead of zero if the end of the
> stream has been reached and len==0*".
>
> 2. Tighten the specification of `Inflater::read(byte[] b, int off, int
> len)` to explicitly forbid any writes into the output array `b` beyond
> the inflated bytes.
>
> 3. Change the implementation of `InflaterInputStream::read(..)` to
> call `Inflater::read(..)` with a temporary buffer and only copy the
> nu,ber of inflated bytes into `InflaterInputStream::read(..)`'s output
> buffer. I think we all agree that this is only a theoretic option
> because of its unacceptable performance regression.
>
> I personally favor option 1 but I'm interested in your opinions?
>
> Thank you and best regards,
> Volker
>
>
>
>
>
> Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
> Oracle Java Engineering
> 1 Network Drive
> Burlington, MA 01803
> lance.ander...@oracle.com
>
>
>

Re: RFC: JDK-8282648: Problems due to conflicting specification of Inflater::inflate(..) and InflaterInputStream::read(..)

Reply via email to