On Fri, 19 Jul 2024 23:16:07 GMT, Archie Cobbs <aco...@openjdk.org> wrote:

>> src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 143:
>> 
>>> 141:      * @param allowConcatenation true to allow multiple concatenated 
>>> compressed data streams,
>>> 142:      *                           or false to expect exactly one 
>>> compressed data stream
>>> 143:      * @param ignoreExtraBytes true to tolerate and ignore extra 
>>> bytes, false to throw
>> 
>> The term "extra" here feels somewhat open to interpretation. Specifically, 
>> "extra" sounds like something that is out of the ordinary, but not uncommon 
>> or wrong. It could be used when describing an optional feature in a format 
>> specification.
>> 
>> The API description referes to "unexpected data". Perhaps the word 
>> "unexpected" is a more precise and descriptive term to use in this option? 
>> So `ignoreUnexpectedBytes` or `ignoreUnexpectedData`.
>> 
>> I think my eyebrows would be raised more when seeing someone ignoring 
>> 'unexpected data' rather than when ignoring 'extra data'.
>> 
>> I know this might smell of bikeshedding, but naming is important (and hard!).
>
>> The term "extra" here feels somewhat open to interpretation. Specifically, 
>> "extra" sounds like something that is out of the ordinary, but not uncommon 
>> or wrong. It could be used when describing an optional feature in a format 
>> specification.
>> 
>> The API description referes to "unexpected data". Perhaps the word 
>> "unexpected" is a more precise and descriptive term to use in this option? 
>> So ignoreUnexpectedBytes or ignoreUnexpectedData.
> 
> This is a good point.. because it's actually a bit subtle what kind of data 
> is being referred to here as being "ignored". Even "unexpected" doesn't quite 
> capture it.
> 
> To be precise, here's what bytes the  `ignoreExtraBytes` parameter is going 
> to cause to be ignored:
> * In the case `allowConcatenation == false`, it really does refer to "extra" 
> or "extraneous" data, that is: one or more bytes having any value appearing 
> _after_ the end of the GZIP trailer frame.
> * In the case `allowConcatenation == true`, it means a truncated or invalid 
> GZIP _header frame_ appearing after the end of any GZIP trailer frame.
> 
> The second case means "unexpected" seems wrong because why would unexpected 
> data in a header frame be treated any differently from unexpected data any 
> other frame (which of course is not going to be ignored but instead will 
> trigger an exception)?
> 
> So maybe this is just more evidence that we shouldn't use boolean parameters 
> - because they're not really orthogonal.
> 
> What about something like:
> 
> 
> public enum ConcatPolicy {
>     DISALLOW_STRICT,    // one GZIP stream, nothing else
>     DISALLOW_LENIENT,   // one GZIP stream, ignore any extra byte(s)
>     ALLOW_STRICT,       // N GZIP streams, nothing else
>     ALLOW_LENIENT;      // N GZIP streams, stop at, and ignore, an invalid 
> header frame
> }
> 
> The legacy behavior would of course correspond to `ALLOW_LENIENT`.

Another thought:

Are we sure we would want to introduce a new mode now which does _not_ allow 
concatenation, but _does_ ignore trailing data?

If the ignore mode is really discouraged, why open a new mode of config 
allowing it?

In other words, could we instead (at least conceptually)  have the policies 
LEGACY, SINGLE_STREAM, CONCATENATED_STREAM where the latter two always reject 
trailing data?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1686942202

Reply via email to