On Fri, 19 Jul 2024 23:16:07 GMT, Archie Cobbs <aco...@openjdk.org> wrote:
>> src/java.base/share/classes/java/util/zip/GZIPInputStream.java line 143: >> >>> 141: * @param allowConcatenation true to allow multiple concatenated >>> compressed data streams, >>> 142: * or false to expect exactly one >>> compressed data stream >>> 143: * @param ignoreExtraBytes true to tolerate and ignore extra >>> bytes, false to throw >> >> The term "extra" here feels somewhat open to interpretation. Specifically, >> "extra" sounds like something that is out of the ordinary, but not uncommon >> or wrong. It could be used when describing an optional feature in a format >> specification. >> >> The API description referes to "unexpected data". Perhaps the word >> "unexpected" is a more precise and descriptive term to use in this option? >> So `ignoreUnexpectedBytes` or `ignoreUnexpectedData`. >> >> I think my eyebrows would be raised more when seeing someone ignoring >> 'unexpected data' rather than when ignoring 'extra data'. >> >> I know this might smell of bikeshedding, but naming is important (and hard!). > >> The term "extra" here feels somewhat open to interpretation. Specifically, >> "extra" sounds like something that is out of the ordinary, but not uncommon >> or wrong. It could be used when describing an optional feature in a format >> specification. >> >> The API description referes to "unexpected data". Perhaps the word >> "unexpected" is a more precise and descriptive term to use in this option? >> So ignoreUnexpectedBytes or ignoreUnexpectedData. > > This is a good point.. because it's actually a bit subtle what kind of data > is being referred to here as being "ignored". Even "unexpected" doesn't quite > capture it. > > To be precise, here's what bytes the `ignoreExtraBytes` parameter is going > to cause to be ignored: > * In the case `allowConcatenation == false`, it really does refer to "extra" > or "extraneous" data, that is: one or more bytes having any value appearing > _after_ the end of the GZIP trailer frame. > * In the case `allowConcatenation == true`, it means a truncated or invalid > GZIP _header frame_ appearing after the end of any GZIP trailer frame. > > The second case means "unexpected" seems wrong because why would unexpected > data in a header frame be treated any differently from unexpected data any > other frame (which of course is not going to be ignored but instead will > trigger an exception)? > > So maybe this is just more evidence that we shouldn't use boolean parameters > - because they're not really orthogonal. > > What about something like: > > > public enum ConcatPolicy { > DISALLOW_STRICT, // one GZIP stream, nothing else > DISALLOW_LENIENT, // one GZIP stream, ignore any extra byte(s) > ALLOW_STRICT, // N GZIP streams, nothing else > ALLOW_LENIENT; // N GZIP streams, stop at, and ignore, an invalid > header frame > } > > The legacy behavior would of course correspond to `ALLOW_LENIENT`. Another thought: Are we sure we would want to introduce a new mode now which does _not_ allow concatenation, but _does_ ignore trailing data? If the ignore mode is really discouraged, why open a new mode of config allowing it? In other words, could we instead (at least conceptually) have the policies LEGACY, SINGLE_STREAM, CONCATENATED_STREAM where the latter two always reject trailing data? ------------- PR Review Comment: https://git.openjdk.org/jdk/pull/18385#discussion_r1686942202