On Thu, 6 Jun 2024 17:03:57 GMT, Archie Cobbs <aco...@openjdk.org> wrote:
>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data >> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In >> order to do this, after a GZIP trailer frame is read, it attempts to read a >> GZIP header frame and, if successful, proceeds onward to decompress the new >> stream. If the attempt to decode a GZIP header frame fails, or happens to >> trigger an `IOException`, it just ignores the trailing garbage and/or the >> `IOException` and returns EOF. >> >> There are several issues with this: >> >> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring >> trailing garbage are not documented, much less precisely specified. >> 2. Ignoring trailing garbage is dubious because it could easily hide errors >> or other data corruption that an application would rather be notified about. >> Moreover, the API claims that a `ZipException` will be thrown when corrupt >> data is read, but obviously that doesn't happen in the trailing garbage >> scenario (so N concatenated streams where the last one has a corrupted >> header frame is indistinguishable from N-1 valid streams). >> 3. There's no way to create a `GZIPInputStream` that does _not_ support >> stream concatenation. >> >> On the other hand, `GZIPInputStream` is an old class with lots of existing >> usage, so it's important to preserve the existing behavior, warts and all >> (note: my the definition of "existing behavior" here includes the bug fix in >> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)). >> >> So this patch adds a new constructor that takes two new parameters >> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is >> enabled by setting both to true; otherwise, they do what they sound like. In >> particular, when `allowTrailingGarbage` is false, then the underlying input >> must contain exactly one (if `allowConcatenation` is false) or exactly N (if >> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an >> exception is guaranteed. > > Archie Cobbs has updated the pull request with a new target base due to a > merge or a rebase. The pull request now contains 12 commits: > > - Bump @since from 23 to 24. > - Merge branch 'master' into JDK-8322256 > - Relabel "trailing garbage" as "extra bytes" to sound less accusatory. > - Simplify code by eliminating an impossible case. > - Field name change and Javadoc wording tweaks. > - Merge branch 'master' into JDK-8322256 > - Javadoc wording tweaks. > - Merge branch 'master' into JDK-8322256 > - Clarify exceptions: sometimes ZipException, sometimes EOFException. > - Merge branch 'master' into JDK-8322256 > - ... and 2 more: https://git.openjdk.org/jdk/compare/75dc2f85...f845a75b src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 153: > 151: */ > 152: public GZIPInputStream(InputStream in, int size, > 153: boolean allowConcatenation, boolean ignoreExtraBytes) throws > IOException { I haven't reviewed the javadoc changes. I will do that separately once we settle down on the API and implementation changes. As for this new constructor, I think the only new parameter we should introduce is whether or not the underlying input stream `in` is expected/allowed to have concatenated GZIP stream(s). So I think we should remove the "ignoreExtraBytes" new parameters from this constructor and. Additionally, I think the `allowConcatenation` should instead be named `allowConcatenatedGZIPStream`: public GZIPInputStream(InputStream in, int size, boolean allowConcatenatedGZIPStream) throws IOException { ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1677592647