Currently, `InflaterInputStream::read()` first does a native call to the 
underlying zlib `inflate()` function and only afterwards checks if the inflater 
requires input (i.e. `Inflater::needsInput()`) or has finished inflating 
(`Inflater::finished()`). This leads to an unnecessary native call to 
`inflate()` when `InflaterInputStream::read()` is invoked for the very first 
time because `Inflater::fill()` hasn't been called and another unnecessary 
native call to detect the end of the stream. For small streams/files which 
completely fit into the output buffer passed to `InflaterInputStream::read()` 
we can therefore save two out of three native calls. The attached micro 
benchmark shows that this results in a 5%-10% performance improvement for zip 
files of sizes between 4096 to 256 bytes (notice that in common jars like 
Tomcat, Spring-Boot, Maven, Jackson, etc. about 60-80% of the classes are 
smaller than 4096 bytes).


before JDK-8281962
------------------
Benchmark                                     (size)  Mode  Cnt  Score   Error  
Units
InflaterInputStreams.inflaterInputStreamRead     256  avgt    5  2.571 ± 0.120  
us/op
InflaterInputStreams.inflaterInputStreamRead     512  avgt    5  2.861 ± 0.064  
us/op
InflaterInputStreams.inflaterInputStreamRead    4096  avgt    5  5.110 ± 0.278  
us/op

after JDK-8281962
-----------------
Benchmark                                     (size)  Mode  Cnt  Score   Error  
Units
InflaterInputStreams.inflaterInputStreamRead     256  avgt    5  2.332 ± 0.081  
us/op
InflaterInputStreams.inflaterInputStreamRead     512  avgt    5  2.691 ± 0.293  
us/op
InflaterInputStreams.inflaterInputStreamRead    4096  avgt    5  4.812 ± 1.038  
us/op


Tested with the JTreg zip/jar/zipfs and the JCK zip/jar tests.

As a side effect, this change also fixes an issue with alternative zlib 
implementations like zlib-Chromium or zlib-Cloudflare which pad the inflated 
bytes with a specif byte pattern at the end of the output for debugging 
purpose. This breaks code patterns like the following:


int readcount = 0;
while ((bytesRead = inflaterInputStream.read(data, 0, bufferSize)) != -1) {
    outputStream.write(data, 0, bytesRead);
    readCount++;
}
if (readCount == 1) {
    return data;         //  <---- first bytes might be overwritten
}
return outputStream.toByteArray();


Even if the whole data fits into the `data` array during the first call to 
`inflaterInputStream.read()`, we still need a second call to 
`inflaterInputStream.read()` in order to detect the end of the inflater stream. 
If this second call calls into the native `inflate()` function of 
Cloudflare/Chromium's zlib version, this will still write some padding bytes at 
the beginning of the `data` array and overwrite the previously read data. This 
issue has been reported in Spring [1] and ASM [2] when using these libraries 
with Cloudflares zlib version (by setting `LD_LIBRARY_PATH` accordingly).

[1] https://github.com/spring-projects/spring-framework/issues/27429
[2] https://gitlab.ow2.org/asm/asm/-/issues/317955

-------------

Commit messages:
 - 8281962: Avoid unnecessary native calls in InflaterInputStream

Changes: https://git.openjdk.java.net/jdk/pull/7492/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=7492&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8281962
  Stats: 114 lines in 2 files changed: 112 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/7492.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/7492/head:pull/7492

PR: https://git.openjdk.java.net/jdk/pull/7492

Reply via email to