> Can I please get a review of this change which proposes to address a > performance regression in `java.util.zip.GZIPInputStream`? > > In JDK 23, we updated `GZIPInputStream` to no longer rely on > `InputStream.available()` method when determining if a `InputStream` consists > of additional (concatenated) GZIP streams. That was done in > https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of > `InputStream.available()` with explicit call(s) to `InputStream.read()` to > detect and read any additional concatenated GZIP stream header. > > Given how old that existing code in `GZIPInputStream` was, efforts were made > to ensure that it wouldn't break existing applications, for the common use > case. A release note too was added in an effort to highlight this change. > > Unfortunately, when reviewing and experimenting with this change I didn't > realize that when we try to detect additional bytes in the stream which has > already reached EOF, the `EOFException` that gets raised by this code could > contribute to a performance degradation. The code does handle this exception > appropriately, but it does mean that in the common case of a stream formed of > just one GZIP stream, we would now end up constructing an `EOFException` in > this code path. > > Construction of `Throwable` instances involves filling the stacktrace into > that instance and is a relatively expensive operation. As a result, we ended > up regressing the performance of this code path for the common use case of > parsing a non-concatenated GZIP stream. > > The commit in this PR addresses this issue by introducing an alternate code > path in the (private) `readHeader()` method to prevent an `EOFException` from > being generated when the stream has already reached EOF and we are trying to > determine if there's any additional concatenated GZIP stream header in the > stream. This addresses the performance regression as well as lets us retain > the changes that we had done in JDK-7036144 to prevent the usage of > `InputStream.available()`. > > The reporter of this performance regression provided a JMH benchmark. I've > run it against JDK 22 (the version that doesn't contain the JDK-7036144 > change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) > and with this change proposed in this PR. I can see that with this change, > the performance numbers improve and go back to the JDK 22 levels. I haven't > included that JMH benchmark in this PR and am looking for inputs on whether > we should include it. > > No new regression test has been in...
Jaikiran Pai has updated the pull request incrementally with one additional commit since the last revision: use -1 to represent absence of a GZIP header, from readHeader() method ------------- Changes: - all: https://git.openjdk.org/jdk/pull/29092/files - new: https://git.openjdk.org/jdk/pull/29092/files/9c2cf68b..06e7b370 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=29092&range=01 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=29092&range=00-01 Stats: 4 lines in 1 file changed: 0 ins; 0 del; 4 mod Patch: https://git.openjdk.org/jdk/pull/29092.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/29092/head:pull/29092 PR: https://git.openjdk.org/jdk/pull/29092
