Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

Jaikiran Pai Mon, 15 Jul 2024 03:09:30 -0700

On Thu, 6 Jun 2024 17:03:57 GMT, Archie Cobbs <[email protected]> wrote:


>> `GZIPInputStream` supports reading data from multiple concatenated GZIP data 
>> streams since [JDK-4691425](https://bugs.openjdk.org/browse/JDK-4691425). In 
>> order to do this, after a GZIP trailer frame is read, it attempts to read a 
>> GZIP header frame and, if successful, proceeds onward to decompress the new 
>> stream. If the attempt to decode a GZIP header frame fails, or happens to 
>> trigger an `IOException`, it just ignores the trailing garbage and/or the 
>> `IOException` and returns EOF.
>> 
>> There are several issues with this:
>> 
>> 1. The behaviors of (a) supporting concatenated streams and (b) ignoring 
>> trailing garbage are not documented, much less precisely specified.
>> 2. Ignoring trailing garbage is dubious because it could easily hide errors 
>> or other data corruption that an application would rather be notified about. 
>> Moreover, the API claims that a `ZipException` will be thrown when corrupt 
>> data is read, but obviously that doesn't happen in the trailing garbage 
>> scenario (so N concatenated streams where the last one has a corrupted 
>> header frame is indistinguishable from N-1 valid streams).
>> 3. There's no way to create a `GZIPInputStream` that does _not_ support 
>> stream concatenation.
>> 
>> On the other hand, `GZIPInputStream` is an old class with lots of existing 
>> usage, so it's important to preserve the existing behavior, warts and all 
>> (note: my the definition of "existing behavior" here includes the bug fix in 
>> [JDK-7036144](https://bugs.openjdk.org/browse/JDK-7036144)).
>> 
>> So this patch adds a new constructor that takes two new parameters 
>> `allowConcatenation` and `allowTrailingGarbage`. The legacy behavior is 
>> enabled by setting both to true; otherwise, they do what they sound like. In 
>> particular, when `allowTrailingGarbage` is false, then the underlying input 
>> must contain exactly one (if `allowConcatenation` is false) or exactly N (if 
>> `allowConcatenation` is true) concatenated GZIP data streams, otherwise an 
>> exception is guaranteed.
>
> Archie Cobbs has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains 12 commits:
> 
>  - Bump @since from 23 to 24.
>  - Merge branch 'master' into JDK-8322256
>  - Relabel "trailing garbage" as "extra bytes" to sound less accusatory.
>  - Simplify code by eliminating an impossible case.
>  - Field name change and Javadoc wording tweaks.
>  - Merge branch 'master' into JDK-8322256
>  - Javadoc wording tweaks.
>  - Merge branch 'master' into JDK-8322256
>  - Clarify exceptions: sometimes ZipException, sometimes EOFException.
>  - Merge branch 'master' into JDK-8322256
>  - ... and 2 more: https://git.openjdk.org/jdk/compare/75dc2f85...f845a75b

src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 153:

> 151:      */
> 152:     public GZIPInputStream(InputStream in, int size,
> 153:             boolean allowConcatenation, boolean ignoreExtraBytes) throws 
> IOException {

I haven't reviewed the javadoc changes. I will do that separately once we 
settle down on the API and implementation changes.
As for this new constructor, I think the only new parameter we should introduce 
is whether or not the underlying input stream `in` is expected/allowed to have 
concatenated GZIP stream(s). So I think we should remove the "ignoreExtraBytes" 
new parameters from this constructor and. Additionally, I think the 
`allowConcatenation` should instead be named `allowConcatenatedGZIPStream`:


public GZIPInputStream(InputStream in, int size, boolean 
allowConcatenatedGZIPStream) throws IOException {

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1677592647

Re: RFR: 8322256: Define and document GZIPInputStream concatenated stream semantics [v5]

Reply via email to