On Wed, 7 Jan 2026 13:27:43 GMT, Jaikiran Pai <[email protected]> wrote:
> Can I please get a review of this change which proposes to address a > performance regression in `java.util.zip.GZIPInputStream`? > > In JDK 23, we updated `GZIPInputStream` to no longer rely on > `InputStream.available()` method when determining if a `InputStream` consists > of additional (concatenated) GZIP streams. That was done in > https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of > `InputStream.available()` with explicit call(s) to `InputStream.read()` to > detect and read any additional concatenated GZIP stream header. > > Given how old that existing code in `GZIPInputStream` was, efforts were made > to ensure that it wouldn't break existing applications, for the common use > case. A release note too was added in an effort to highlight this change. > > Unfortunately, when reviewing and experimenting with this change I didn't > realize that when we try to detect additional bytes in the stream which has > already reached EOF, the `EOFException` that gets raised by this code could > contribute to a performance degradation. The code does handle this exception > appropriately, but it does mean that in the common case of a stream formed of > just one GZIP stream, we would now end up constructing an `EOFException` in > this code path. > > Construction of `Throwable` instances involves filling the stacktrace into > that instance and is a relatively expensive operation. As a result, we ended > up regressing the performance of this code path for the common use case of > parsing a non-concatenated GZIP stream. > > The commit in this PR addresses this issue by introducing an alternate code > path in the (private) `readHeader()` method to prevent an `EOFException` from > being generated when the stream has already reached EOF and we are trying to > determine if there's any additional concatenated GZIP stream header in the > stream. This addresses the performance regression as well as lets us retain > the changes that we had done in JDK-7036144 to prevent the usage of > `InputStream.available()`. > > The reporter of this performance regression provided a JMH benchmark. I've > run it against JDK 22 (the version that doesn't contain the JDK-7036144 > change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) > and with this change proposed in this PR. I can see that with this change, > the performance numbers improve and go back to the JDK 22 levels. I haven't > included that JMH benchmark in this PR and am looking for inputs on whether > we should include it. > > No new regression test has been in... This pull request has now been integrated. Changeset: a4fb07ee Author: Jaikiran Pai <[email protected]> URL: https://git.openjdk.org/jdk/commit/a4fb07ee3e26c2f0ed3111c39c3a22167d292d04 Stats: 49 lines in 1 file changed: 41 ins; 1 del; 7 mod 8374644: Regression in GZIPInputStream performance after JDK-7036144 Reviewed-by: lancea, alanb ------------- PR: https://git.openjdk.org/jdk/pull/29092
