On Wed, 7 Jan 2026 14:08:02 GMT, Jaikiran Pai <[email protected]> wrote:
>> Can I please get a review of this change which proposes to address a >> performance regression in `java.util.zip.GZIPInputStream`? >> >> In JDK 23, we updated `GZIPInputStream` to no longer rely on >> `InputStream.available()` method when determining if a `InputStream` >> consists of additional (concatenated) GZIP streams. That was done in >> https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of >> `InputStream.available()` with explicit call(s) to `InputStream.read()` to >> detect and read any additional concatenated GZIP stream header. >> >> Given how old that existing code in `GZIPInputStream` was, efforts were made >> to ensure that it wouldn't break existing applications, for the common use >> case. A release note too was added in an effort to highlight this change. >> >> Unfortunately, when reviewing and experimenting with this change I didn't >> realize that when we try to detect additional bytes in the stream which has >> already reached EOF, the `EOFException` that gets raised by this code could >> contribute to a performance degradation. The code does handle this exception >> appropriately, but it does mean that in the common case of a stream formed >> of just one GZIP stream, we would now end up constructing an `EOFException` >> in this code path. >> >> Construction of `Throwable` instances involves filling the stacktrace into >> that instance and is a relatively expensive operation. As a result, we ended >> up regressing the performance of this code path for the common use case of >> parsing a non-concatenated GZIP stream. >> >> The commit in this PR addresses this issue by introducing an alternate code >> path in the (private) `readHeader()` method to prevent an `EOFException` >> from being generated when the stream has already reached EOF and we are >> trying to determine if there's any additional concatenated GZIP stream >> header in the stream. This addresses the performance regression as well as >> lets us retain the changes that we had done in JDK-7036144 to prevent the >> usage of `InputStream.available()`. >> >> The reporter of this performance regression provided a JMH benchmark. I've >> run it against JDK 22 (the version that doesn't contain the JDK-7036144 >> change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) >> and with this change proposed in this PR. I can see that with this change, >> the performance numbers improve and go back to the JDK 22 levels. I haven't >> included that JMH benchmark in this PR and am looking for inputs on whether >> we should include it. >> >> No n... > > Jaikiran Pai has updated the pull request incrementally with one additional > commit since the last revision: > > use -1 to represent absence of a GZIP header, from readHeader() method Good ------------- Marked as reviewed by alanb (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/29092#pullrequestreview-3636361111
