On Wed, 6 Apr 2022 16:28:21 GMT, Volker Simonis <simo...@openjdk.org> wrote:
>> Add an API note to `InflaterInputStream::read(byte[] b, int off, int len)` >> to highlight that it might write more bytes than the returned number of >> inflated bytes into the buffer `b`. >> >> The superclass `java.io.InputStream` specifies that `read(byte[] b, int off, >> int len)` will leave the content beyond the last read byte in the read >> buffer `b` unaffected. However, the overridden `read` method in >> `InflaterInputStream` passes the read buffer `b` to >> `Inflater::inflate(byte[] b, int off, int len)` which doesn't provide this >> guarantee. Depending on implementation details, `Inflater::inflate` might >> write more than the returned number of inflated bytes into the buffer `b`. >> >> ### TL;DR >> >> `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater >> functionality. `Inflater::inflate(byte[] output, int off, int len)` >> currently calls zlib's native `inflate(..)` function and passes the address >> of `output[off]` and `len` to it via JNI. >> >> The specification of zlib's `inflate(..)` function (i.e. the [API >> documentation in the original zlib >> implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400)) >> doesn't give any guarantees with regard to usage of the output buffer. It >> only states that upon completion the function will return the number of >> bytes that have been written (i.e. "inflated") into the output buffer. >> >> The original zlib implementation only wrote as many bytes into the output >> buffer as it inflated. However, this is not a hard requirement and newer, >> more performant implementations of the zlib library like >> [zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/) >> or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more bytes >> of the output buffer than they actually inflate as a scratch buffer. See >> https://github.com/simonis/zlib-chromium for a more detailed description of >> their approach and its performance benefit. >> >> These new zlib versions can still be used transparently from Java (e.g. by >> putting them into the `LD_LIBRARY_PATH` or by using `LD_PRELOAD`), because >> they still fully comply to specification of `Inflater::inflate(..)`. >> However, we might run into problems when using the `Inflater` functionality >> from the `InflaterInputStream` class. `InflaterInputStream` is derived from >> from `InputStream` and as such, its `read(byte[] b, int off, int len)` >> method is quite constrained. It specifically specifies that if *k* bytes >> have been read, then "these bytes will be stored in elements `b[off]` >> through `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through >> `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] b, int >> off, int len)` (which is constrained by `InputStream::read(..)`'s >> specification) calls `Inflater::inflate(byte[] b, int off, int len)` and >> directly passes its output buffer down to the native zlib `inflate(..)` >> method which is free to change the bytes beyond `b[off+` *k*`]` (where *k* is the number of inflated bytes). >> >> From a practical point of view, I don't see this as a big problem, because >> callers of `InflaterInputStream::read(byte[] b, int off, int len)` can never >> know how many bytes will be written into the output buffer `b` (and in fact >> its content can always be completely overwritten). It therefore makes no >> sense to depend on any data there being untouched after the call. Also, >> having used zlib-cloudflare productively for about two years, we haven't >> seen real-world issues because of this behavior yet. However, from a >> specification point of view it is easy to artificially construct a program >> which violates `InflaterInputStream::read(..)`'s postcondition if using one >> of the alterantive zlib implementations. A recently integrated JTreg test >> (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" fails >> with zlib-chromium but can fixed easily to run with alternative >> implementations as well (see >> [JDK-8283756](https://bugs.openjdk.java.net/browse/JDK-8283756)). > > Volker Simonis has updated the pull request incrementally with one additional > commit since the last revision: > > Added API-refinement to GZIPInputStream::read()/ZipInputStream::read() and > an Implementation note to ZipFile::getInputStream() The `GZIPInputStream` and `ZipInputStream` files will need a copyright year update. src/java.base/share/classes/java/util/zip/ZipInputStream.java line 173: > 171: * bytes are read and {@code 0} is returned. > 172: * <p> > 173: * If <i>n</i> denotes the returned number of inflated bytes then > {@code b[off]} I think this might need a slightly different wording since in this case of `ZipInputStream#read(...)`, the entry's compression method decides whether or not an inflater will be used to return inflated bytes. So maybe we should just remove the reference to "inflated bytes" and instead say: * If <i>n</i> denotes the return value, then {@code b[off]} through {@code b[off+}<i>n</i>{@code -1]} * will contain the data that has been read. The elements {@code b[off+}<i>n</i>{@code ]} through * {@code b[off+}<i>len</i>{@code -1]} are undefined and the implementation may update them during this read * operation. If the return value is -1 or an exception is thrown the whole content of {@code b} is undefined. ------------- PR: https://git.openjdk.java.net/jdk/pull/7986