[
https://issues.apache.org/jira/browse/COMPRESS-565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287589#comment-17287589
]
Peter Lee commented on COMPRESS-565:
------------------------------------
I'm not familiar with *Expand-Archive Powershell utility*. Is it open sourced
or not? I can't find anything on google.
7zip is open sourced but I'm not familiar with its code.:(
The difference between using
_output.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS)_
or not using it is:
whether we are adding the extra field _Info-ZIP Unicode Path Extra Field_ in
the extra field or not. And I think the reason why 7z is complaining and
*Expand-Archive Powershell utility* on Windows can't extract the archive is :
*_Info-ZIP Unicode Path Extra Field_ is not supported by them*.
See also: sector 4.6.9 of [zip
APPNOTE|https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT] for more
detailed information
With _ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS_ being set, we will
always add the _Info-ZIP Unicode Path Extra Field_, which can be seen in the
generated zip:
!image-2021-02-20-15-51-21-747.png!
I can make some simple explanations :
First of all, zip format is using little endian.
The first 2 bytes 0x7075 is the signature of _Info-ZIP Unicode Path Extra
Field_.And the 0x000e is the size of this field, which is 14.
The 0x01 is the version of this extra field, which is always 1 now(according to
the [zip APPNOTE|https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT]).
The 4 bytes 0x7df6c07c is the CRC32 checksum of the file name(which can be
easily checked with any CRC32 check tools using the name _input.bin_).
The 9 bytes 0x69 6e 70 75 74 2e 62 69 6e is the UTF-8 value of the file name,
which is _input.bin_.
You can see that 9 + 4 + 1 = 14 is exactly the length of this field I
mentioned. So I think we have built a correct _Info-ZIP Unicode Path Extra
Field._
> Regression - Corrupted headers when using 64 bit ZipArchiveOutputStream
> -----------------------------------------------------------------------
>
> Key: COMPRESS-565
> URL: https://issues.apache.org/jira/browse/COMPRESS-565
> Project: Commons Compress
> Issue Type: Bug
> Components: Archivers
> Affects Versions: 1.20
> Reporter: Evgenii Bovykin
> Assignee: Peter Lee
> Priority: Major
> Attachments: image-2021-02-20-15-51-21-747.png
>
>
> We've recently updated commons-compress library from version 1.9 to 1.20 and
> now experiencing the problem that didn't occur before.
>
> When using ZipArchiveOutputStream to archive 5Gb file and setting the
> following fields
> {{output.setUseZip64(Zip64Mode.Always)}}
>
> {{output.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS)}}
> resulting archive contains corrupted headers.
> *Expand-Archive Powershell utility cannot extract the archive at all with the
> error about corrupted header. 7zip also complains about it, but can extract
> the archive.*
>
> The problem didn't appear when using library version 1.9.
>
> I've created a sample project that reproduces the error -
> [https://github.com/missingdays/commons-compress-example]
> Issue doesn't reproduce if you do any of the following:
>
> # Downgrade library to version 1.9
> # Remove
> output.setCreateUnicodeExtraFields(ZipArchiveOutputStream.UnicodeExtraFieldPolicy.ALWAYS)
> # Remove output.setUseZip64(Zip64Mode.Always) and zip smaller file (e.g. 1Gb)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)