[GitHub] [commons-compress] PeterAlfreadLee opened a new pull request #90: Compress-477 : add zip64 support for split zip

GitBox Fri, 10 Jan 2020 18:36:58 -0800

PeterAlfreadLee opened a new pull request #90: Compress-477 : add zip64 support 
for split zip
URL: https://github.com/apache/commons-compress/pull/90
 
 
   As it's said in [#86](https://github.com/apache/commons-compress/pull/86)
   
   > While rebasing I realized that master says the disk number start is a long 
while this branch uses an int. Actually both master and this branch neglect the 
possibility of the number of disks requiring Zip64 extra handling. TBH I find 
the idea of a split ZIP archive that spans more than 64k files a bit disturbing 
and would address it with a separate change independent of this PR.
   
   This PR is about the zip64 for split zip and some bug fixing in existing 
code.
   
   1. For Central File Header
   As it's written in Zip Specification :
   
   > 4.4.13 disk number start: (2 bytes)
          The number of the disk on which this file begins.  If an 
          archive is in ZIP64 format and the value in this field is 
          0xFFFF, the size will be in the corresponding 4 byte zip64 
          extended information extra field.
   
   Which means the `disk number start` may also exceed the maximum value 
`0xFFFF` and may exist in extra field. The existing code in 
`createCentralFileHeader` ignored this possibility and didn't detect it.
   That's why I added `|| ze.getDiskNumberStart() >= ZIP64_MAGIC_SHORT` for 
this.
   
   And some testcases in `Zip64SupportIT` are modified correspodingly :
   (1) `extra field length` : 28 -> 32
   (2) `size of extra` : 24 -> 28
   
   2. For End Of Central Directory :
   There are 6 variables that may exist in Zip64 End Of Central Directory and 
need to be checked :
   ```
   numberOfThisDisk
   cdDiskNumberStart
   numOfEntriesOnThisDisk
   numberOfEntries/entries.size()
   cdLength
   cdOffset
   ```
   The existing code only checked the `numberOfEntries/entries.size()` and 
`cdOffset`. So the check for the other 4 variables are added in this PR.
   
   3. Some testcases for zip64 split zip are added(which creates 70,000+ split 
segments).
   
   When I was writing the testcases, I found it's not easy to test the Zip64 
exceptions in End Of Central Directory - the exceptions are always thrown when 
writing the Central Directory Header. So the newly added exceptions are not 
tested - I didn't find a proper way to test them.
   
   P.S :
   When I was trying to extract the 70,000+ split segments using 
`ZipSplitReadOnlySeekableByteChannel`, I got an error : `too many open files`. 
I realized that the `ZipSplitReadOnlySeekableByteChannel` and 
`MultiReadOnlySeekableByteChannel` may open all the split segments before 
reading them - and I think this is meaningless and could be improved. I'm 
trying to improve them using a 'open files when needed' way. Therefore we don't 
need to open all the files before reading, which may use a lot of file handles 
of OS, and we can extract split zips that contain a lot of split segments.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [commons-compress] PeterAlfreadLee opened a new pull request #90: Compress-477 : add zip64 support for split zip

Reply via email to