On Mon, 25 May 2026 13:07:36 GMT, Jaikiran Pai <[email protected]> wrote:
>> Can I please get a review of this doc and test-only change for >> https://bugs.openjdk.org/browse/JDK-8322256? >> >> The `java.util.zip.GZIPInputStream` has been in the JDK since Java 1.1. >> However, its specification hasn't been clear on how it behaves, especially >> when a `InputStream` consists of more than one GZIP member. >> >> The commit in this PR updates the specification of this class to match its >> current (long standing) implementation. >> >> A new jtreg test has been introduced to verify this behaviour. >> >> I'll draft a CSR shortly. >> >> --------- >> - [x] I confirm that I make this contribution in accordance with the >> [OpenJDK Interim AI Policy](https://openjdk.org/legal/ai). > > Jaikiran Pai has updated the pull request with a new target base due to a > merge or a rebase. The incremental webrev excludes the unrelated changes > brought in by the merge/rebase. The pull request contains 23 additional > commits since the last revision: > > - fix copyright year on BasicGZIPInputStreamTest.java > - "or after" -> "or after them" > - Alan's review - remove use of implSpec from GZIPInputStream.read() > - merge latest from master branch > - Alan's review - update the class level doc > - merge latest from master branch > - merge latest from master branch > - merge latest from master branch > - minor adjustments > - Lance's review > - ... and 13 more: https://git.openjdk.org/jdk/compare/4f72bbd4...bb2af7c1 Hello Volker, > Now we all know that the `GZIPInputStream` implementation was wrong and under > some circumstances `GZIPInputStream::read()` could return early and ignore > existing, concatenated "compressed members" > ([JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)). This problem > has been fixed in JDK 23 and we have downported the fix to OpenJDK 17 and 21 > because we have seen a silent data loss in the > [Ion](https://amazon-ion.github.io/ion-docs/) open source serialization > library which is heavily used within Amazon. GZIPInputStream has been using InputStream.available() to decide whether or not to read additional data past a GZIP member for several decades. Like you note, in certain applications/libraries that then means that if InputStream.available() returned `0` then additional data may not be read. There's an additional factor to this which relates to an implementation detail of decompression, but for the sake of the current discussion, I am focusing on the InputStream.available() part. Because when you say ignoring existing data, I'm guessing you are talking about how the implementation relies on the result of InputStream.available() to be non-zero. Is that what you mean by silent data loss when used in Ion library? After the change in JDK-7036144 was integrated, this potential data loss was cited as a reason why backporting JDK-7036144 to OpenJDK update releases was necessary. However, the original backporter did not provide any further details about it. It's not always possible to provide those details when the issue only happens with some customer data. However, if there's an issue that's reproducible with an open source library (Ion?) then it would be good to create an JBS issue explaining the problem and attach a reproducer to that. Do note that the changes done in JDK-7036144 weren't motivated by any (reported) data loss issues. As for the reason why we reverted the JDK-7036144 changes, it was because there have applications that have been impacted by regressions (performance as well as functional, I believe JDK-8377896 is a functional issue) caused by the change to the long standing behaviour of that code. The current behaviour of several of these java.util.zip area APIs has been around for far too long (I realize you probably already know that, but I note it here for the sake of wider audience). The decisions that were made for these implementations during those days may not seem appropriate today. When dealing with such code, our approach has been to retain the backward compatibility as much as possible, even if those are implementation details and not API specifications. It's no different here and that's why we decided to undo the changes in JDK-7036144. In fact, in hindsight we should probably have just retained the current behaviour and solved the issue in a different manner when we first integrated th e changes in JDK-7036144. Although a usage search of GZIPInputStream was run to identify potential usages that may run into issues due to this change, clearly it wasn't enough. Not just that, it now clearly shows that we lack test coverage for `GZIPInputStream`. The current PR isn't trying to solve any bug or introduce any new behaviour to `GZIPInputStream`. It merely documents the current implementation and specifies the behaviour to whatever extent it makes sense to specify it. We have done a similar thing for several other classes in the JDK as and when we have touched that area of the code. > In my opinion, the JBS issue for this PR (i.e. > [JDK-8322256](https://bugs.openjdk.org/browse/JDK-8322256)) suggests one > possible way to fix this issue: > > .... There are different ways to solve this, yes. In fact, a month or so back, I had done a proof of concept of an alternate implementation which would allow applications/libraries to have more control over how this class behaves when it comes to detecting additional GZIP members in the stream. But that's just one minor detail of all of this. Without documenting the current behaviour of this class, adding more code to it will make it more susceptible to issues that we may not have considered. I realize the current implementation in this class can be problematic in certain cases. My suggestion is to create a separate bug(s) (hopefully with a reproducer) that shows what problems Ion library runs into. Like any other API in the JDK, we can then evaluate how to address it and whether it's worth introducing a change in this class which while allowing applications to continue to rely on the current behaviour will also solve the issue for other applications which desire a different behaviour. ------------- PR Comment: https://git.openjdk.org/jdk/pull/30925#issuecomment-4535751213
