PeterAlfreadLee opened a new pull request #90: Compress-477 : add zip64 support for split zip URL: https://github.com/apache/commons-compress/pull/90 As it's said in [#86](https://github.com/apache/commons-compress/pull/86) > While rebasing I realized that master says the disk number start is a long while this branch uses an int. Actually both master and this branch neglect the possibility of the number of disks requiring Zip64 extra handling. TBH I find the idea of a split ZIP archive that spans more than 64k files a bit disturbing and would address it with a separate change independent of this PR. This PR is about the zip64 for split zip and some bug fixing in existing code. 1. For Central File Header As it's written in Zip Specification : > 4.4.13 disk number start: (2 bytes) The number of the disk on which this file begins. If an archive is in ZIP64 format and the value in this field is 0xFFFF, the size will be in the corresponding 4 byte zip64 extended information extra field. Which means the `disk number start` may also exceed the maximum value `0xFFFF` and may exist in extra field. The existing code in `createCentralFileHeader` ignored this possibility and didn't detect it. That's why I added `|| ze.getDiskNumberStart() >= ZIP64_MAGIC_SHORT` for this. And some testcases in `Zip64SupportIT` are modified correspodingly : (1) `extra field length` : 28 -> 32 (2) `size of extra` : 24 -> 28 2. For End Of Central Directory : There are 6 variables that may exist in Zip64 End Of Central Directory and need to be checked : ``` numberOfThisDisk cdDiskNumberStart numOfEntriesOnThisDisk numberOfEntries/entries.size() cdLength cdOffset ``` The existing code only checked the `numberOfEntries/entries.size()` and `cdOffset`. So the check for the other 4 variables are added in this PR. 3. Some testcases for zip64 split zip are added(which creates 70,000+ split segments). When I was writing the testcases, I found it's not easy to test the Zip64 exceptions in End Of Central Directory - the exceptions are always thrown when writing the Central Directory Header. So the newly added exceptions are not tested - I didn't find a proper way to test them. P.S : When I was trying to extract the 70,000+ split segments using `ZipSplitReadOnlySeekableByteChannel`, I got an error : `too many open files`. I realized that the `ZipSplitReadOnlySeekableByteChannel` and `MultiReadOnlySeekableByteChannel` may open all the split segments before reading them - and I think this is meaningless and could be improved. I'm trying to improve them using a 'open files when needed' way. Therefore we don't need to open all the files before reading, which may use a lot of file handles of OS, and we can extract split zips that contain a lot of split segments.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
