On Mon, 25 May 2026 13:07:36 GMT, Jaikiran Pai <[email protected]> wrote:

>> Can I please get a review of this doc and test-only change for 
>> https://bugs.openjdk.org/browse/JDK-8322256?
>> 
>> The `java.util.zip.GZIPInputStream` has been in the JDK since Java 1.1. 
>> However, its specification hasn't been clear on how it behaves, especially 
>> when a `InputStream` consists of more than one GZIP member. 
>> 
>> The commit in this PR updates the specification of this class to match its 
>> current (long standing) implementation.
>> 
>> A new jtreg test has been introduced to verify this behaviour.
>> 
>> I'll draft a CSR shortly.
>> 
>> ---------
>> - [x] I confirm that I make this contribution in accordance with the 
>> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai).
>
> Jaikiran Pai has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains 23 additional 
> commits since the last revision:
> 
>  - fix copyright year on BasicGZIPInputStreamTest.java
>  - "or after" -> "or after them"
>  - Alan's review - remove use of implSpec from GZIPInputStream.read()
>  - merge latest from master branch
>  - Alan's review - update the class level doc
>  - merge latest from master branch
>  - merge latest from master branch
>  - merge latest from master branch
>  - minor adjustments
>  - Lance's review
>  - ... and 13 more: https://git.openjdk.org/jdk/compare/4f72bbd4...bb2af7c1

Hello Volker,

> Now we all know that the `GZIPInputStream` implementation was wrong and under 
> some circumstances `GZIPInputStream::read()` could return early and ignore 
> existing, concatenated "compressed members" 
> ([JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)). This problem 
> has been fixed in JDK 23 and we have downported the fix to OpenJDK 17 and 21 
> because we have seen a silent data loss in the 
> [Ion](https://amazon-ion.github.io/ion-docs/) open source serialization 
> library which is heavily used within Amazon.

GZIPInputStream has been using InputStream.available() to decide whether or not 
to read additional data past a GZIP member for several decades. Like you note, 
in certain applications/libraries that then means that if 
InputStream.available() returned `0` then additional data may not be read. 
There's an additional factor to this which relates to an implementation detail 
of decompression, but for the sake of the current discussion, I am focusing on 
the InputStream.available() part. Because when you say ignoring existing data, 
I'm guessing you are talking about how the implementation relies on the result 
of InputStream.available() to be non-zero. Is that what you mean by silent data 
loss when used in Ion library?

After the change in JDK-7036144 was integrated, this potential data loss was 
cited as a reason why backporting JDK-7036144 to OpenJDK update releases was 
necessary. However, the original backporter did not provide any further details 
about it. It's not always possible to provide those details when the issue only 
happens with some customer data. However, if there's an issue that's 
reproducible with an open source library (Ion?) then it would be good to create 
an JBS issue explaining the problem and attach a reproducer to that. Do note 
that the changes done in JDK-7036144 weren't motivated by any (reported) data 
loss issues.

As for the reason why we reverted the JDK-7036144 changes, it was because there 
have applications that have been impacted by regressions (performance as well 
as functional, I believe JDK-8377896 is a functional issue) caused by the 
change to the long standing behaviour of that code. The current behaviour of 
several of these java.util.zip area APIs has been around for far too long (I 
realize you probably already know that, but I note it here for the sake of 
wider audience). The decisions that were made for these implementations during 
those days may not seem appropriate today. When dealing with such code, our 
approach has been to retain the backward compatibility as much as possible, 
even if those are implementation details and not API specifications. It's no 
different here and that's why we decided to undo the changes in JDK-7036144. In 
fact, in hindsight we should probably have just retained the current behaviour 
and solved the issue in a different manner when we first integrated th
 e changes in JDK-7036144. Although a usage search of GZIPInputStream was run 
to identify potential usages that may run into issues due to this change, 
clearly it wasn't enough. Not just that, it now clearly shows that we lack test 
coverage for `GZIPInputStream`.

The current PR isn't trying to solve any bug or introduce any new behaviour to 
`GZIPInputStream`. It merely documents the current implementation and specifies 
the behaviour to whatever extent it makes sense to specify it. We have done a 
similar thing for several other classes in the JDK as and when we have touched 
that area of the code.


> In my opinion, the JBS issue for this PR (i.e. 
> [JDK-8322256](https://bugs.openjdk.org/browse/JDK-8322256)) suggests one 
> possible way to fix this issue:
>
> ....

There are different ways to solve this, yes. In fact, a month or so back, I had 
done a proof of concept of an alternate implementation which would allow 
applications/libraries to have more control over how this class behaves when it 
comes to detecting additional GZIP members in the stream. But that's just one 
minor detail of all of this. Without documenting the current behaviour of this 
class, adding more code to it will make it more susceptible to issues that we 
may not have considered.


I realize the current implementation in this class can be problematic in 
certain cases. My suggestion is to create a separate bug(s) (hopefully with a 
reproducer) that shows what problems Ion library runs into. Like any other API 
in the JDK, we can then evaluate how to address it and whether it's worth 
introducing a change in this class which while allowing applications to 
continue to rely on the current behaviour will also solve the issue for other 
applications which desire a different behaviour.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/30925#issuecomment-4535751213

Reply via email to