Hi Volker,

I have read through what you have provided/pointed to, thank you, and on the 
surface what you are suggesting sounds reasonable.

That being said given that this API dates back to 1997ish, I think we have to 
be careful not introduce any regressions with existing applications with the 
proposal you suggest(even though it is just relaxes the spec), as you mentioned 
their is a jtreg test that  seems to fail.

Have you run the JCK with your ZLIB implementation?  I only skimmed the tests 
but looks like there might be a couple of tests which validate the array’s 
contents.

We don’t have a lot of test coverage so if we were to consider moving forward 
with your proposal, I believe additional test coverage would be warranted.



Best
Lance


On Mar 4, 2022, at 5:04 AM, Volker Simonis 
<volker.simo...@gmail.com<mailto:volker.simo...@gmail.com>> wrote:

`java.util.zip.Inflater` is the Java wrapper class for zlib's inflater
functionality. `Inflater::inflate(byte[] output, int off, int len)`
currently calls zlib's native `inflate(..)` function and passes the
address of `output[off]` and `len` to it via JNI.

The specification of zlib's `inflate(..)` function (i.e. the [API
documentation in the original zlib
implementation](https://github.com/madler/zlib/blob/cacf7f1d4e3d44d871b605da3b647f07d718623f/zlib.h#L400))
doesn't give any guarantees with regard to usage of the output buffer.
It only states that upon completion the function will return the
number of bytes that have been written (i.e. "inflated") into the
output buffer.

The original zlib implementation only wrote as many bytes into the
output buffer as it inflated. However, this is not a hard requirement
and newer, more performant implementations of the zlib library like
[zlib-chromium](https://chromium.googlesource.com/chromium/src/third_party/zlib/)
or [zlib-cloudflare](https://github.com/cloudflare/zlib) can use more
bytes of the output buffer than they actually inflate as a scratch
buffer. See https://github.com/simonis/zlib-chromium for a more
detailed description of their approach and its performance benefit.

These new zlib versions can still be used transparently from Java
(e.g. by putting them into the `LD_LIBRARY_PATH` or by using
`LD_PRELOAD`), because they still fully comply to specification of
`Inflater::inflate(..)`. However, we might run into problems when
using the `Inflater` functionality from the `InflaterInputStream`
class. `InflaterInputStream` is derived from from `InputStream` and as
such, its `read(byte[] b, int off, int len)` method is quite
constrained. It specifically specifies that if *k* bytes have been
read, then "these bytes will be stored in elements `b[off]` through
`b[off+`*k*`-1]`, leaving elements `b[off+`*k*`]` through
`b[off+len-1]` **unaffected**". But `InflaterInputStream::read(byte[]
b, int off, int len)` (which is constrained by
`InputStream::read(..)`'s specification) calls
`Inflater::inflate(byte[] b, int off, int len)` and directly passes
its output buffer down to the native zlib `inflate(..)` method which
is free to change the bytes beyond `b[off+`*k*`]` (where *k* is the
number of inflated bytes).

From a practical point of view, I don't see this as a big problem,
because callers of `InflaterInputStream::read(byte[] b, int off, int
len)` can never know how many bytes will be written into the output
buffer `b` (and in fact its content can always be completely
overwritten). It therefore makes no sense to depend on any data there
being untouched after the call. Also, having used zlib-cloudflare
productively for about two years, we haven't seen real-world issues
because of this behavior yet. However, from a specification point of
view it is easy to artificially construct a program which violates
`InflaterInputStream::read(..)`'s postcondition if using one of the
alterantive zlib implementations. A recently integrated JTreg test
(test/jdk/jdk/nio/zipfs/ZipFSOutputStreamTest.java) "unintentionally"
fails with zlib-chromium but could be easily fixed to run with
alternative implementations as well.

What should/can be done to solve this problem? I think we have three options:

1. Relax the specification of `InflaterInputStream::read(..)` and
specifically note in the API documentation that a call to
`InflaterInputStream::read(byte[] b, int off, int len)` may write more
than *k* characters into `b` where *k* is the returned number of
inflated bytes. Notice that there's a precedence for this approach in
`java.io.ByteArrayInputStream::read(..)` which "*unlike the overridden
method of InputStream, returns -1 instead of zero if the end of the
stream has been reached and len==0*".

2. Tighten the specification of `Inflater::read(byte[] b, int off, int
len)` to explicitly forbid any writes into the output array `b` beyond
the inflated bytes.

3. Change the implementation of `InflaterInputStream::read(..)` to
call `Inflater::read(..)` with a temporary buffer and only copy the
nu,ber of inflated bytes into `InflaterInputStream::read(..)`'s output
buffer. I think we all agree that this is only a theoretic option
because of its unacceptable performance regression.

I personally favor option 1 but I'm interested in your opinions?

Thank you and best regards,
Volker

[cid:E1C4E2F0-ECD0-4C9D-ADB4-B16CA7BCB7FC@home]



Lance Andersen| Principal Member of Technical Staff | +1.781.442.2037
Oracle Java Engineering
1 Network Drive
Burlington, MA 01803
lance.ander...@oracle.com<mailto:lance.ander...@oracle.com>



Reply via email to