[ 
https://issues.apache.org/jira/browse/COMPRESS-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig updated COMPRESS-51:
-----------------------------------

    Fix Version/s: 1.0

> Enable creation of tool-readable ZIP archives with file names containing 
> non-ASCII characters
> ---------------------------------------------------------------------------------------------
>
>                 Key: COMPRESS-51
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-51
>             Project: Commons Compress
>          Issue Type: Improvement
>         Environment: Any / All
>            Reporter: Christian Gosch
>            Assignee: Stefan Bodewig
>             Fix For: 1.0
>
>         Attachments: commons-compress-utf8-creation-svn741897.patch, 
> utf8-7zip-test.zip, utf8-winzip-test.zip
>
>
> Currently it is not possible to generate externally readable ZIP archives 
> with java.util.zip.* or org.apache.commons.compress.* when entries to include 
> shall have names with characters outside US-ASCII. This should be changed to 
> enable at least org.apache.commons.compress.* to produce ZIP archives in 
> international context which are readable by usual ZIP archiver tools like 
> pkzip, gzip, WinZIP, PowerArchiver, WinRAR / rar, StuffIt...
> For java.util.zip.* this is due to a really old flaw on handling entry names: 
> They are just always rendered as UTF-8, which is kind of Java specific, and 
> not as Cp437, which is expected and written by most ZIP archiver tools (or 
> eventually all). For more details see:
> http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4244499
> http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4820807
> For org.apache.commons.compress.archivers.zip.* the "compress & save" 
> operation can be easily improved by extending ZipArchive:
> // Add member:
>     protected String m_encoding = null;
> // Add constructor:
>     public ZipArchive(String encoding) {
>         m_encoding = encoding;
>     }
> // Extend doSave(FileOutputStream):
> // ...
>               // Pack-Operation
>               ZipOutputStream out = null;
>               try {
>                       out = new ZipOutputStream(new 
> BufferedOutputStream(output));
>             if (m_encoding != null) {   // added
>                 out.setEncoding(m_encoding);   // added
>             }  // added
>                       while(iterator.hasNext()) {
> // ...
> Now it is possible to instantiate a ZipArchive with "Cp437" as encoding, and 
> external tools can figure out the original entry names even if they contain 
> non-ASCII characters. (On the other hand, Java cannot read back & deflate 
> such an archive since it expects UTF-8!)
> The "read & deflate" operation for ZipArchive is more difficult to extend 
> since it currently relies completely on java.util.zip.* . The other reason 
> is, that ZIP archives do not contain any hint on the character encoding used 
> for file names etc. It seems that the usual tools simply use Cp437 and Java 
> simply uses UTF-8 -- without any declaration of reasons. Thus a deflater has 
> to try.
> For TarArchive the problem is unclear. Here the commons-compress 
> implementation does not rely on third-party code as far as I can see, and TAR 
> is no Java-bound file type (like JAR, which is Java-bound). Thus chances are, 
> that everything works well, even when entry names with non-ASCII characters 
> come into play.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to