On Mon, 9 May 2022 09:56:19 GMT, Volker Simonis <simo...@openjdk.org> wrote:

>> Add an API note to `InflaterInputStream::read(byte[] b, int off, int len)` 
>> to highlight that it might  write more bytes than the returned number of 
>> inflated bytes into the buffer `b`.
>> The superclass `java.io.InputStream` specifies that `read(byte[] b, int off, 
>> int len)` will leave the content beyond the last read byte in the read 
>> buffer `b` unaffected. However, the overridden `read` method in 
>> `InflaterInputStream` passes the read buffer `b` to 
>> `Inflater::inflate(byte[] b, int off, int len)` which doesn't provide this 
>> guarantee. Depending on implementation details, `Inflater::inflate` might 
>> write more than the returned number of inflated bytes into the buffer `b`.
>> ### TL;DR
>> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater 
>> functionality. `Inflater::inflate(byte[] output, int off, int len)` 
>> currently calls zlib's native `inflate(..)` function and passes the address 
>> of `output[off]` and `len` to it via JNI.
>> The specification of zlib's `inflate(..)` function (i.e. the [API 
>> documentation in the original zlib 
>> implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400))
>>  doesn't give any guarantees with regard to usage of the output buffer. It 
>> only states that upon completion the function will return the number of 
>> bytes that have been written (i.e. "inflated") into the output buffer.
>> The original zlib implementation only wrote as many bytes into the output 
>> buffer as it inflated. However, this is not a hard requirement and newer, 
>> more performant implementations of the zlib library like 
>> [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/)
>>  or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more bytes 
>> of the output buffer than they actually inflate as a scratch buffer. See 
>> https://github.com/simonis/zlib-chromium for a more detailed description of 
>> their approach and its performance benefit.
>> These new zlib versions can still be used transparently from Java (e.g. by 
>> putting them into the `LD_LIBRARY_PATH` or by using `LD_PRELOAD`), because 
>> they still fully comply to specification of `Inflater::inflate(..)`. 
>> However, we might run into problems when using the `Inflater` functionality 
>> from the `InflaterInputStream` class. `InflaterInputStream` is derived from 
>> from `InputStream` and as such, its `read(byte[] b, int off, int len)` 
>> method is quite constrained. It specifically specifies that if *k* bytes 
>> have been read, then "these bytes will be stored in elements `b[off]` 
>> through `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through 
>> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] b, int 
>> off, int len)` (which is constrained by `InputStream::read(..)`'s 
>> specification) calls `Inflater::inflate(byte[] b, int off, int len)` and 
>> directly passes its output buffer down to the native zlib `inflate(..)` 
>> method which is free to change the bytes beyond `b[off+`
 *k*`]` (where *k* is the number of inflated bytes).
>> From a practical point of view, I don't see this as a big problem, because 
>> callers of `InflaterInputStream::read(byte[] b, int off, int len)` can never 
>> know how many bytes will be written into the output buffer `b` (and in fact 
>> its content can always be completely overwritten). It therefore makes no 
>> sense to depend on any data there being untouched after the call. Also, 
>> having used zlib-cloudflare productively for about two years, we haven't 
>> seen real-world issues because of this behavior yet. However, from a 
>> specification point of view it is easy to artificially construct a program 
>> which violates `InflaterInputStream::read(..)`'s postcondition if using one 
>> of the alterantive zlib implementations. A recently integrated JTreg test 
>> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails 
>> with zlib-chromium but can fixed easily to run with alternative 
>> implementations as well (see 
>> [JDK-8283756](https://bugs.openjdk.java.net/browse/JDK-8283756)).
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
>   Updated wording based on @JoeDarcy's third CSR review

I've uploaded yet another version of this PR based on @jddarcy latest 
[CSR](https://bugs.openjdk.java.net/browse/JDK-8283758) review. This brings 
back all the changes to the API documentation of `InflaterInputStream`, 
`GZipInputStream` and `ZipInputStream` with an additional note in the API doc 
for `InputStream::read()` which indicates that subclasses of `InputStream` are 
free to relax the strong requirements imposed by `InputStream::read()`.

Hopefully this version finally gets some consensus :)


PR: https://git.openjdk.java.net/jdk/pull/7986

Reply via email to