On Mon, 25 May 2026 16:36:51 GMT, Jaikiran Pai <[email protected]> wrote:
> Because when you say ignoring existing data, I'm guessing you are talking > about how the implementation relies on the result of InputStream.available() > to be non-zero. Is that what you mean by silent data loss when used in Ion > library? When talking about "*ignoring existing data*" and "*silent data loss*" I mean the fact that applications which rightfully rely on the documented promise of `GZIPInputStream` to implement RFC 1952 and specifically support for concatenated GZIP streams, can fail to read compressed GZIP stream members which follow the very first one. This failure happens **silently** because `GZIPInputStream` pretends that it reached the end of the stream and **randomly** because the failure can depend on factors like the size of the first compressed member and the buffering strategy of the underlying input stream. When we first detected this malfunctioning in production, we considered it as a serious bug, which is the reason why we immediately downported its fix [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144) to OpenJDK 17 and 21. > I realize the current implementation in this class can be problematic in > certain cases. My suggestion is to create a separate bug(s) (hopefully with a > reproducer) that shows what problems Ion library runs into. Like any other > API in the JDK, we can then evaluate how to address it and whether it's worth > introducing a change in this class which while allowing applications to > continue to rely on the current behaviour will also solve the issue for other > applications which desire a different behaviour. There's no need for a new bug report, because [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144) already describes the issue very well and already comes with a simple reproducer. As that bug already describes since ~15 years: "*Depending on member alignment and network/IO issues, GZIPInnputStream may erroneously end early and inconsistently*". Any higher level library, like e.g. Ion, which wraps custom input stream into a `GZIPInputStream` in order to support compression, is affected by this issue. We were shocked when we realized that [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144) (reported against JDK 6) was open for 13 year, before it finally got fixed in JDK 23. > As for the reason why we reverted the JDK-7036144 changes, it was because > there are applications that have been impacted by regressions (performance as > well as functional, I believe JDK-8377896 is a functional issue) caused by > the change to the long standing behaviour of that code. What do you mean by "*functional issue*"? I think it is normal, that every higher level protocol that reads data from a stream on a lower level protocol, can rightfully expect that EOF is handled correctly on the lower level, and if it doesn't, then that's an issue on the lower protocol level, and has to be handled there. As I wrote, the issue described in [JDK-8377896](https://bugs.openjdk.org/browse/JDK-8377896) could easily be fixed by closing the connection on the socket level if no more data follows. Once again, I think it is much more important, that a client has the ability to correctly read the complete payload it was sent rather then terminating the read operation quickly at the risk of missing some trailing data from the payload. > The current PR isn't trying to solve any bug or introduce any new behaviour > to GZIPInputStream. It merely documents the current implementation and > specifies the behaviour to whatever extent it makes sense to specify it. We > have done a similar thing for several other classes in the JDK as and when we > have touched that area of the code. Again, this PR makes the situation worse, because it more prominently highlights the support for concatenated GZIP streams according to RFC 1952 while it doesn't say anything about the inherent danger of silently and randomly dropping some of those concatenated stream members. Instead, it merely states that `InputStream::available()` is used to check for additional members. As I wrote, no user of this API document will understand the implication of this sentence, especially not the consequence that it can lead to truncated reads. Even worse, many users of this method won't be able to verify the implementation of `::available()` in the underlying input stream, so even if they would be aware of the immanent risk, they can't confirm or exclude it for their particular use case. ------------- PR Comment: https://git.openjdk.org/jdk/pull/30925#issuecomment-4544741187
