[
https://issues.apache.org/jira/browse/COMPRESS-429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16266991#comment-16266991
]
Stefan Bodewig edited comment on COMPRESS-429 at 11/27/17 4:06 PM:
-------------------------------------------------------------------
Many thanks Damiano.
What may have been unclear from my response is that WinZip is the only archiver
I'm aware of that sets the unicode extra fields - and its implementation has
some known problems. I really don't share your view that "Only the so-called
Unicode extra field (if present) can be trusted". I for one only trust the EFS
flag.
If I understand correctly you want to have a method that tells you where the
name came from, not whether it is contains any Unicode characters not found in
CP437, right? The name {{hasUnicodeName}} doesn't really work for me. Maybe add
an enum and methods like {{getNameSource}} with EXTRA_FIELD (maybe even more
specific like UNICODE_EXTRA_FIELD), NAME_WITH_EFS_FLAG and NAME as possible
outcomes?
was (Author: bodewig):
Many thanks Damiano.
What may have been unclear from my response is that WinZip is the only archiver
I'm aware of that sets the unicode extra fields - and its implementation has
some known problems. I really don't share your view that "Only the so-called
Unicode extra field (if present) can be trusted". I for one only trust the EFS
flag.
If I understand correctly you want to have a method that tells you where the
name came from, not whether it is contains any Unicode characters not found in
CP437, right? The name {{hasUnicodeName}} doesn't really work for me. Maybe add
an enum and methods like {{getNameSource}} with EXTRA_FIELD, NAME_WITH_EFS_FLAG
and NAME as possible outcomes?
> Expose whether ZIP entry name & comment come from Unicode extra field
> ---------------------------------------------------------------------
>
> Key: COMPRESS-429
> URL: https://issues.apache.org/jira/browse/COMPRESS-429
> Project: Commons Compress
> Issue Type: Improvement
> Reporter: Damiano Albani
> Priority: Minor
> Labels: Unicode, ZIP
>
> It is known fact that detecting the encoding of the name/comment of ZIP
> entries is a messy process. And that the general purpose bit 11 is often
> unreliable.
> Only the so-called Unicode extra field (if present) can be trusted to
> reliably determine a ZIP entry name & comment, as far as I understand.
> But the current API of Commons Compress doesn't (easily) expose in which
> situation the ZIP archive reader is.
> That's why I propose to add a couple of new getter/setter-exposed fields to
> {{ZipArchiveEntry}}, e.g.:
> {noformat}
> boolean hasUnicodeName
> boolean hasUnicodeComment
> {noformat}
> This way it can be easily determined if the value returned by
> {{ZipArchiveEntry::getName}} or {{ZipArchiveEntry::getComment}} can be
> trusted. Or if it needs some "character encoding sniffing" of sorts.
> What do you think?
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)