[
https://issues.apache.org/jira/browse/COMPRESS-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16697766#comment-16697766
]
Stefan Bodewig edited comment on COMPRESS-471 at 11/25/18 7:57 PM:
-------------------------------------------------------------------
Hi,
{{I found below solution -}}
{code}
private boolean isUTF8Encoded(ZipFile zipFile) {
boolean foundUTF8 = false;
if (zipFile != null) {
foundUTF8 = zipFile.getEncoding().equalsIgnoreCase("UTF8");
Enumeration<ZipArchiveEntry> list = zipFile.getEntries();
if (list != null && list.hasMoreElements()) {
ZipArchiveEntry entry;
if ((entry = list.nextElement()) != null)
foundUTF8 = entry.getGeneralPurposeBit().usesUTF8ForNames(); // using GPB
}
}
return foundUTF8;
}
{code}
if above API returns false then I can use another constructor of zip file with
CP850 charset and get the desired file names.
Please let me know whether above approach is fine or not.
Thanks
was (Author: gk.mittal):
Hi,
{{I found below solution -}}
{quote}private boolean isUTF8Encoded(ZipFile zipFile) {
boolean foundUTF8 = false;
if (zipFile != null) {
foundUTF8 = zipFile.getEncoding().equalsIgnoreCase("UTF8");
Enumeration<ZipArchiveEntry> list = zipFile.getEntries();
if (list != null && list.hasMoreElements()) {
ZipArchiveEntry entry;
if ((entry = list.nextElement()) != null)
foundUTF8 = entry.getGeneralPurposeBit().usesUTF8ForNames(); // using GPB
}
}
return foundUTF8;
}
{quote}
if above API returns false then I can use another constructor of zip file with
CP850 charset and get the desired file names.
Please let me know whether above approach is fine or not.
Thanks
> Zipped files names having non UTF-8 encoding are being replaced with '?'
> while previewing file.
> -----------------------------------------------------------------------------------------------
>
> Key: COMPRESS-471
> URL: https://issues.apache.org/jira/browse/COMPRESS-471
> Project: Commons Compress
> Issue Type: Bug
> Affects Versions: 1.18
> Reporter: Gaurav Mittal
> Priority: Major
> Attachments: Document(▒Γ║╗)_20150226_11.zip, Incorrect.JPG,
> correct.JPG
>
>
> | * All the strings which are not supported by UTF-8 are being replaced by
> '?' symbol,
> In the issue scenario the charset is 'Cp850', Since the common compress
> library cannot identify the 'Cp850' charset and it takes the default charset
> as 'UTF-8' therefore
> we can see the '?' symbol
> In our code
> ZipFile ret = new ZipFile(path);
> Moreover if we send the encoding in the function as defined below, it works
> fine
> ZipFile ret = new ZipFile(new File(path), "Cp850",false);
> But the second scenario where we are forcibly giving the encoding as 'Cp850'
> may cause side effects in some cases
> --------------------------------------------------------------------------
> Below code does not seem to resolve UTF8 conflicts and could not make file
> names into correct form -
>
> try {
> final Map<ZipArchiveEntry, NameAndComment> entriesWithoutUTF8Flag =
> populateFromCentralDirectory();
> resolveLocalFileHeaderData(entriesWithoutUTF8Flag);
> success = true;
> } finally {
> closed = !success;
> if (!success && closeOnError) {
> IOUtils.closeQuietly(archive);
> }
> }|
> | |
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)