[ 
https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698272#comment-16698272
 ] 

Stefan Bodewig commented on COMPRESS-471:
-----------------------------------------

(I've take the liberty to reformat your code)

The main problem with your code is there is no way for an archive to say which 
encoding has been used when it has been created. {{getEncoding}} returns the 
encoding you specify as constructor argument - or in absence of such an 
argument it will always be UTF-8,

I'm not really sure what you are trying to achieve with your test. The method 
will return true if the very last entry uses the EFS flag.

If you are willing to rely on the EFS flag (the {{usesUTF8ForNames}} bit of the 
GPB) then specifying CP850 will just do what you need. For any entry that 
carries this bit, {{ZipFile}} will use UTF-8 and ignore your explicit encoding 
anyway.

> Zipped files names having non UTF-8 encoding are being replaced with '?' 
> while previewing file.
> -----------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-471
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-471
>             Project: Commons Compress
>          Issue Type: Bug
>    Affects Versions: 1.18
>            Reporter: Gaurav Mittal
>            Priority: Major
>         Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG, 
> correct.JPG
>
>
> | * All the strings which are not supported by UTF-8 are being replaced by 
> '?' symbol, 
> In the issue scenario the charset is 'Cp850', Since the common compress 
> library cannot identify the 'Cp850' charset and it takes the default charset 
> as 'UTF-8' therefore
>  we can see the '?' symbol
> In our code 
> ZipFile ret = new ZipFile(path);
> Moreover if we send the encoding in the function as defined below, it works 
> fine
> ZipFile ret = new ZipFile(new File(path), "Cp850",false);
> But the second scenario where we are forcibly giving the encoding as 'Cp850' 
> may cause side effects in some cases
>  --------------------------------------------------------------------------
> Below code does not seem to resolve UTF8 conflicts and could not make file 
> names into correct form -
>  
> try {
>  final Map<ZipArchiveEntry, NameAndComment> entriesWithoutUTF8Flag =
>  populateFromCentralDirectory();
>  resolveLocalFileHeaderData(entriesWithoutUTF8Flag); 
>  success = true;
> } finally {
>  closed = !success;
>  if (!success && closeOnError) {
>  IOUtils.closeQuietly(archive);
>  }
> }|
> | |



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to