[
https://issues.apache.org/jira/browse/COMPRESS-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16253732#comment-16253732
]
Stefan Bodewig commented on COMPRESS-429:
-----------------------------------------
In my experience only WinZip uses the unicode extra field, all others (apart
from Windows Compressed Folders, which doesn't support Unicode at all) have
switched to the EFS flag by now. So maybe you do not want to put too much
effort in reading the extra field. In addition when we look at what WinZip does
(COMPRESS-427 and COMPRESS-176) it's hard to say one could trust its content.
{{hasUnicodeName()}} would be equivalent to
{{getExtraField(UnicodePathExtraField.UPATH_ID) != null}} and you'd probably
want to call {{getExtraField}} if this was true anyway - just in case the
{{ZipFile}} or stream has been constructed with {{useUnicodeExtraFields}} set
to false.
> Expose whether ZIP entry name & comment come from Unicode extra field
> ---------------------------------------------------------------------
>
> Key: COMPRESS-429
> URL: https://issues.apache.org/jira/browse/COMPRESS-429
> Project: Commons Compress
> Issue Type: Improvement
> Reporter: Damiano Albani
> Priority: Minor
> Labels: Unicode, ZIP
>
> It is known fact that detecting the encoding of the name/comment of ZIP
> entries is a messy process. And that the general purpose bit 11 is often
> unreliable.
> Only the so-called Unicode extra field (if present) can be trusted to
> reliably determine a ZIP entry name & comment, as far as I understand.
> But the current API of Commons Compress doesn't (easily) expose in which
> situation the ZIP archive reader is.
> That's why I propose to add a couple of new getter/setter-exposed fields to
> {{ZipArchiveEntry}}, e.g.:
> {noformat}
> boolean hasUnicodeName
> boolean hasUnicodeComment
> {noformat}
> This way it can be easily determined if the value returned by
> {{ZipArchiveEntry::getName}} or {{ZipArchiveEntry::getComment}} can be
> trusted. Or if it needs some "character encoding sniffing" of sorts.
> What do you think?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)