Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Lance Andersen Mon, 22 Mar 2021 14:42:13 -0700

On Fri, 19 Mar 2021 08:19:58 GMT, Lin Zang <lz...@openjdk.org> wrote:


>> 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional 
>> GZIP fields
>
> Lin Zang has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - update copyright
>  - reuse arguments constructor for non-argument one.

Hi Lin,
 
Again, thank you for taking on this 19+ year feature request.

I have not done a deep dive on the CSR, but wanted to get a few comments back 
to you on a quick scan of the last PR update.

Personally, I would like to see more testing for a change such as this given 
the age of the code:

- Please do not modify the existing test, you can either create  a new test(s) 
or add tests to the existing test class
- We should capture Gzip files with these headers set from other tools and 
store the Gzip in an array within the test which can then be written to disk so 
the tests can validate interoperability.  Please see some of the other Zip 
tests for an example
- We should have tests that include some , but not all of the additional fields 
as I believe they are all optional according to the RFC.
- Please include some negative tests


I have also include some additional comments within the code

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 193:

> 191:      * @param generateHeaderCRC
> 192:      *        if {@code true} the header will include the CRC16 of the 
> header.
> 193:      * @param extraFieldBytes the byte array of extra filed,

filed -> field

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 195:

> 193:      * @param extraFieldBytes the byte array of extra filed,
> 194:      *                        the generated header would calculate the 
> byte[] size
> 195:      *                        and fill it before the byte[] in header.

This is not clear what you are trying to articulate here

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 269:

> 267:      * @param generateHeaderCRC
> 268:      *        if {@code true} the header will include the CRC16 of the 
> header.
> 269:      * @param extraFieldBytes the byte array of extra filed,

filed -> field

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 320:

> 318:              *     +---+---+=================================+
> 319:              */
> 320:             int xlen = extraFieldBytes.length;

On a quick look at the RFC,  I noticed the following:

------------------------
2.3.1.1. Extra field

         If the FLG.FEXTRA bit is set, an "extra field" is present in
         the header, with total length XLEN bytes.  It consists of a
         series of subfields, each of the form:

            +---+---+---+---+==================================+
            |SI1|SI2|  LEN  |... LEN bytes of subfield data ...|
            +---+---+---+---+==================================+

         SI1 and SI2 provide a subfield ID, typically two ASCII letters
         with some mnemonic value.  Jean-Loup Gailly
         <g...@prep.ai.mit.edu> is maintaining a registry of subfield
         IDs; please send him any subfield ID you wish to use.  Subfield
         IDs with SI2 = 0 are reserved for future use.  The following
         IDs are currently defined:


---------------

This does not seem to be accounted for or is it?

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 323:

> 321:             if (xlen > 0xffff) {
> 322:                 throw new ZipException("extra field size out of range");
> 323:             }

Where is the ZipException documented that is being thrown?

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 342:

> 340:              *    +=========================================+
> 341:              */
> 342:             out.write(filename);

The RFC indicates:

             If FNAME is set, an original file name is present,
            terminated by a zero byte.  The name must consist of ISO
            8859-1 (LATIN-1) characters; on operating systems using
            EBCDIC or any other character set for file names, the name
            must be translated to the ISO LATIN-1 character set.  This
            is the original name of the file being compressed, with any
            directory components removed, and, if the file being
            compressed is on a file system with case insensitive names,
            forced to lower case. There is no original file name if the
            data was compressed from a source other than a named file;
            for example, if the source was stdin on a Unix system, there
            is no file name.


It looks like there is no validation being done so I am not sure what the 
expectation is.

src/java.base/share/classes/java/util/zip/GZIPOutputStream.java line 357:

> 355:              */
> 356:             out.write(fileComment);
> 357:             out.write(0);

The RFC states:

            If FCOMMENT is set, a zero-terminated file comment is
            present.  This comment is not interpreted; it is only
            intended for human consumption.  The comment must consist of
            ISO 8859-1 (LATIN-1) characters.  Line breaks should be
            denoted by a single line feed character (10 decimal).


So should the characters be validates as well as line breaks?

test/jdk/java/util/zip/GZIP/GZIPOutputStreamHeaderTest.java line 3:

> 1: /*
> 2:  * Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
> 3:  * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER.

The copyright would be 2020, 2021,

-------------

Changes requested by lancea (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3072

Re: RFR: 4890732: GZIPOutputStream doesn't support, in fact thwarts, use of optional GZIP fields [v3]

Reply via email to