Ping... On Thu, Mar 10, 2022 at 8:26 PM Lance Andersen <lance.ander...@oracle.com> wrote:
> Hi Volker, > > Thank you for the reminder > > This is on my radar as well but have not had a chance spend any time on > this as yet. > > > > On Mar 9, 2022, at 2:33 PM, Volker Simonis <volker.simo...@gmail.com> > wrote: > > @Alan, @Lance, > > Sorry for my obtrusiveness, but what's your opinion on this issue? > > Thank you and best regards, > Volker > > On Fri, Mar 4, 2022 at 11:04 AM Volker Simonis <volker.simo...@gmail.com> > wrote: > > > `java.util.zip.Inflater` is the Java wrapper class for zlib's inflater > functionality. `Inflater::inflate(byte[] output, int off, int len)` > currently calls zlib's native `inflate(..)` function and passes the > address of `output[off]` and `len` to it via JNI. > > The specification of zlib's `inflate(..)` function (i.e. the [API > documentation in the original zlib > implementation]( > https://urldefense.com/v3/__https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h*L400__;Iw!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0w3u0Q7HA$ > )) > doesn't give any guarantees with regard to usage of the output buffer. > It only states that upon completion the function will return the > number of bytes that have been written (i.e. "inflated") into the > output buffer. > > The original zlib implementation only wrote as many bytes into the > output buffer as it inflated. However, this is not a hard requirement > and newer, more performant implementations of the zlib library like > [zlib-chromium]( > https://urldefense.com/v3/__https://chromium.googlesource.com/chromium/src/third_party/zlib/__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0yNMBtuew$ > ) > or [zlib-cloudflare]( > https://urldefense.com/v3/__https://github.com/cloudflare/zlib__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0y7Uhi8nA$ > ) can use more > bytes of the output buffer than they actually inflate as a scratch > buffer. See > https://urldefense.com/v3/__https://github.com/simonis/zlib-chromium__;!!ACWV5N9M2RV99hQ!fpVcYO7aobaYs-KfF-KOOiQtV17pdX5BNzn8ZkU0UX9B2Lj2Pcs1uh4mh0z1qVlYPg$ > for a more > detailed description of their approach and its performance benefit. > > These new zlib versions can still be used transparently from Java > (e.g. by putting them into the `LD_LIBRARY_PATH` or by using > `LD_PRELOAD`), because they still fully comply to specification of > `Inflater::inflate(..)`. However, we might run into problems when > using the `Inflater` functionality from the `InflaterInputStream` > class. `InflaterInputStream` is derived from from `InputStream` and as > such, its `read(byte[] b, int off, int len)` method is quite > constrained. It specifically specifies that if *k* bytes have been > read, then "these bytes will be stored in elements `b[off]` through > `b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through > `b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[] > b, int off, int len)` (which is constrained by > `InputStream::read(..)`'s specification) calls > `Inflater::inflate(byte[] b, int off, int len)` and directly passes > its output buffer down to the native zlib `inflate(..)` method which > is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the > number of inflated bytes). > > From a practical point of view, I don't see this as a big problem, > because callers of `InflaterInputStream::read(byte[] b, int off, int > len)` can never know how many bytes will be written into the output > buffer `b` (and in fact its content can always be completely > overwritten). It therefore makes no sense to depend on any data there > being untouched after the call. Also, having used zlib-cloudflare > productively for about two years, we haven't seen real-world issues > because of this behavior yet. However, from a specification point of > view it is easy to artificially construct a program which violates > `InflaterInputStream::read(..)`'s postcondition if using one of the > alterantive zlib implementations. A recently integrated JTreg test > (test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally" > fails with zlib-chromium but could be easily fixed to run with > alternative implementations as well. > > What should/can be done to solve this problem? I think we have three > options: > > 1. Relax the specification of `InflaterInputStream::read(..)` and > specifically note in the API documentation that a call to > `InflaterInputStream::read(byte[] b, int off, int len)` may write more > than *k* characters into `b` where *k* is the returned number of > inflated bytes. Notice that there's a precedence for this approach in > `java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden > method of InputStream, returns -1 instead of zero if the end of the > stream has been reached and len==0*". > > 2. Tighten the specification of `Inflater::read(byte[] b, int off, int > len)` to explicitly forbid any writes into the output array `b` beyond > the inflated bytes. > > 3. Change the implementation of `InflaterInputStream::read(..)` to > call `Inflater::read(..)` with a temporary buffer and only copy the > nu,ber of inflated bytes into `InflaterInputStream::read(..)`'s output > buffer. I think we all agree that this is only a theoretic option > because of its unacceptable performance regression. > > I personally favor option 1 but I'm interested in your opinions? > > Thank you and best regards, > Volker > > > > > > Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037 > Oracle Java Engineering > 1 Network Drive > Burlington, MA 01803 > lance.ander...@oracle.com > > > >