On Wed, 7 Jan 2026 14:08:02 GMT, Jaikiran Pai <[email protected]> wrote:

>> Can I please get a review of this change which proposes to address a 
>> performance regression in `java.util.zip.GZIPInputStream`? 
>> 
>> In JDK 23, we updated `GZIPInputStream` to no longer rely on 
>> `InputStream.available()` method when determining if a `InputStream` 
>> consists of additional (concatenated) GZIP streams. That was done in 
>> https://bugs.openjdk.org/browse/JDK-7036144. That change replaced the use of 
>> `InputStream.available()` with explicit call(s) to `InputStream.read()` to 
>> detect and read any additional concatenated GZIP stream header.
>> 
>> Given how old that existing code in `GZIPInputStream` was, efforts were made 
>> to ensure that it wouldn't break existing applications, for the common use 
>> case. A release note too was added in an effort to highlight this change.
>> 
>> Unfortunately, when reviewing and experimenting with this change I didn't 
>> realize that when we try to detect additional bytes in the stream which has 
>> already reached EOF, the `EOFException` that gets raised by this code could 
>> contribute to a performance degradation. The code does handle this exception 
>> appropriately, but it does mean that in the common case of a stream formed 
>> of just one GZIP stream, we would now end up constructing an `EOFException` 
>> in this code path.
>> 
>> Construction of `Throwable` instances involves filling the stacktrace into 
>> that instance and is a relatively expensive operation. As a result, we ended 
>> up regressing the performance of this code path for the common use case of 
>> parsing a non-concatenated GZIP stream.
>> 
>> The commit in this PR addresses this issue by introducing an alternate code 
>> path in the (private) `readHeader()` method to prevent an `EOFException` 
>> from being generated when the stream has already reached EOF and we are 
>> trying to determine if there's any additional concatenated GZIP stream 
>> header in the stream. This addresses the performance regression as well as 
>> lets us retain the changes that we had done in JDK-7036144 to prevent the 
>> usage of `InputStream.available()`.
>> 
>> The reporter of this performance regression provided a JMH benchmark. I've 
>> run it against JDK 22 (the version that doesn't contain the JDK-7036144 
>> change) , JDK 25 (latest JDK version which contains the JDK-7036144 change) 
>> and with this change proposed in this PR. I can see that with this change, 
>> the performance numbers improve and go back to the JDK 22 levels. I haven't 
>> included that JMH benchmark in this PR and am looking for inputs on whether 
>> we should include it.
>> 
>> No n...
>
> Jaikiran Pai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   use -1 to represent absence of a GZIP header, from readHeader() method

Good

-------------

Marked as reviewed by alanb (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/29092#pullrequestreview-3636361111

Reply via email to