Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Jaikiran Pai Mon, 08 Jan 2024 07:13:45 -0800

On Fri, 22 Dec 2023 07:55:24 GMT, Eirik Bjørsnøs <[email protected]> wrote:


>> ZipInputStream.readEnd currently assumes a Zip64 data descriptor if the 
>> number of compressed or uncompressed bytes read from the inflater is larger 
>> than the Zip64 magic value.
>> 
>> While the ZIP format  mandates that the data descriptor `SHOULD be stored in 
>> ZIP64 format (as 8 byte values) when a file's size exceeds 0xFFFFFFFF`, it 
>> also states that `ZIP64 format MAY be used regardless of the size of a 
>> file`. For such small entries, the above assumption does not hold.
>> 
>> This PR augments ZipInputStream.readEnd to also assume 8-byte sizes if the 
>> ZipEntry includes a Zip64 extra information field. This brings 
>> ZipInputStream into alignment with the APPNOTE format spec:
>> 
>> 
>> When extracting, if the zip64 extended information extra 
>> field is present for the file the compressed and 
>> uncompressed sizes will be 8 byte values.
>> 
>> 
>> While small Zip64 files with 8-byte data descriptors are not commonly found 
>> in the wild, it is possible to create one using the Info-ZIP command line 
>> `-fd` flag:
>> 
>> `echo hello | zip -fd > hello.zip`
>> 
>> The PR also adds a test verifying that such a small Zip64 file can be parsed 
>> by ZipInputStream.
>
> Eirik Bjørsnøs has updated the pull request with a new target base due to a 
> merge or a rebase. The pull request now contains 33 commits:
> 
>  - Merge branch 'master' into data-descriptor
>  - Extract ZIP64_BLOCK_SIZE_OFFSET as a constant
>  - A Zip64 extra field used in a LOC header must include both the 
> uncompressed and compressed size fields, and does not include local header 
> offset or disk start number fields. Conequently, a valid LOC Zip64 block must 
> always be 16 bytes long.
>  - Document better the zip command and options used to generate the test 
> vector ZIP
>  - Fix spelling of "presence"
>  - Add a @bug reference in the test
>  - Use the term "block size" when referring to the size of a Zip64 extra 
> field data block
>  - Update comment reflect that a Zip64 extended field in a LOC header has 
> only two valid block sizes
>  - Convert test from testNG to JUnit
>  - Fix the check that the size of an extra field block size must not grow 
> past the total extra field length
>  - ... and 23 more: https://git.openjdk.org/jdk/compare/e2042421...ddff130f

src/java.base/share/classes/java/util/zip/ZipInputStream.java line 692:

> 690:     private static boolean isZip64ExtBlockSizeValid(int blockSize) {
> 691:         // Uncompressed and compressed size fields are 8 bytes each
> 692:         return blockSize == 16;

I'm not following this check. As far as I can see the `blockSize` being passed 
to this method is the size of the zip64 extra entry and as per the spec:


4.5.3 -Zip64 Extended Information Extra Field (0x0001):

      The following is the layout of the zip64 extended 
      information "extra" block. 

      ....

        Value      Size       Description
        -----      ----       -----------
(ZIP64) 0x0001     2 bytes    Tag for this "extra" block type
        Size       2 bytes    Size of this "extra" block
        Original 
        Size       8 bytes    Original uncompressed file size
        Compressed
        Size       8 bytes    Size of compressed data
        Relative Header
        Offset     8 bytes    Offset of local header record
        Disk Start
        Number     4 bytes    Number of the disk on which
                              this file starts 

So shouldn't it be 8 + 8 + 8 + 4 = 28 bytes and not 16 bytes? Did I 
misunderstand the code or the spec?

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/12524#discussion_r1444786479

Re: RFR: 8303866: Allow ZipInputStream.readEnd to parse small Zip64 ZIP files [v9]

Reply via email to