[ 
https://issues.apache.org/jira/browse/COMPRESS-176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13215704#comment-13215704
 ] 

Stefan Bodewig commented on COMPRESS-176:
-----------------------------------------

This is what InfoZIP's zip on Linux says:

{noformat}
stefanb@brick:~$ zip -Tv Desktop/test-winzip.zip 
Archive:  Desktop/test-winzip.zip
    testing: doc.txt.gz               OK
    testing: doc2.txt                 OK
    testing: ??\                      OK
    testing: ??\??zip.zip             OK
    testing: ??\??.txt                OK
No errors detected in compressed data of Desktop/test-winzip.zip.
test of Desktop/test-winzip.zip OK
{noformat}

The entry for the directory contains a Unicode extra field with 0xc3 0xa4 0x5c 
as UTF-8 encoded name.  This actually is "ä\".

Since directory names in ZIP archives must end with "/" Compress doesn't detect 
this as a directory.  It may be possible to create a workaround like "if the 
'plain name ends with a / and the unicode name uses a \ then bend it", but I 
can't say I'd like that.

Java6 likely works because it doesn't have any idea about unicode extra fields 
and simply uses the "plain" name.  You'd get the same behavior from 
ZipArchiveInputStream by setting useUnicodeExtraFields to false in the 
constructor.
                
> ArchiveInputStream#getNextEntry(): Problems with WinZip directories with 
> Umlauts
> --------------------------------------------------------------------------------
>
>                 Key: COMPRESS-176
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-176
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Archivers
>    Affects Versions: 1.3
>         Environment: Windows 7
>            Reporter: Wurstbrot mit Senf
>         Attachments: test-7zip.zip, test-windows.zip, test-winzip.zip
>
>
> There is a problem when handling a WinZip-created zip with Umlauts in 
> directories.
> I'm accessing a zip file created with WinZip containing a directory with an 
> umlaut ("ä") with ArchiveInputStream. When creating the zip file the 
> unicode-flag of winzip had been active.
> The following problem occurs when accessing the entries of the zip:
> the ArchiveEntry for a directory containing an umlaut is not marked as a 
> directory and the file names for the directory and all files contained in 
> that directory contain backslashes instead of slashes (i.e. completely 
> different to all other files in directories with no umlaut in their path).
> There is no difference when letting the ArchiveStreamFactory decide which 
> ArchiveInputStream to create or when using the ZipArchiveInputStream 
> constructor with the correct encoding (I've tried different encodings CP437, 
> CP850, ISO-8859-15, but still the problem persisted).
> This problem does not occur when using the very same zip file but compressed 
> by 7zip or the built-in Windows 7 zip functionality.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to