[
https://issues.apache.org/jira/browse/COMPRESS-51?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Stefan Bodewig updated COMPRESS-51:
-----------------------------------
Fix Version/s: 1.0
> Enable creation of tool-readable ZIP archives with file names containing
> non-ASCII characters
> ---------------------------------------------------------------------------------------------
>
> Key: COMPRESS-51
> URL: https://issues.apache.org/jira/browse/COMPRESS-51
> Project: Commons Compress
> Issue Type: Improvement
> Environment: Any / All
> Reporter: Christian Gosch
> Assignee: Stefan Bodewig
> Fix For: 1.0
>
> Attachments: commons-compress-utf8-creation-svn741897.patch,
> utf8-7zip-test.zip, utf8-winzip-test.zip
>
>
> Currently it is not possible to generate externally readable ZIP archives
> with java.util.zip.* or org.apache.commons.compress.* when entries to include
> shall have names with characters outside US-ASCII. This should be changed to
> enable at least org.apache.commons.compress.* to produce ZIP archives in
> international context which are readable by usual ZIP archiver tools like
> pkzip, gzip, WinZIP, PowerArchiver, WinRAR / rar, StuffIt...
> For java.util.zip.* this is due to a really old flaw on handling entry names:
> They are just always rendered as UTF-8, which is kind of Java specific, and
> not as Cp437, which is expected and written by most ZIP archiver tools (or
> eventually all). For more details see:
> http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4244499
> http://bugs.sun.com/bugdatabase/view_bug.do;:YfiG?bug_id=4820807
> For org.apache.commons.compress.archivers.zip.* the "compress & save"
> operation can be easily improved by extending ZipArchive:
> // Add member:
> protected String m_encoding = null;
> // Add constructor:
> public ZipArchive(String encoding) {
> m_encoding = encoding;
> }
> // Extend doSave(FileOutputStream):
> // ...
> // Pack-Operation
> ZipOutputStream out = null;
> try {
> out = new ZipOutputStream(new
> BufferedOutputStream(output));
> if (m_encoding != null) { // added
> out.setEncoding(m_encoding); // added
> } // added
> while(iterator.hasNext()) {
> // ...
> Now it is possible to instantiate a ZipArchive with "Cp437" as encoding, and
> external tools can figure out the original entry names even if they contain
> non-ASCII characters. (On the other hand, Java cannot read back & deflate
> such an archive since it expects UTF-8!)
> The "read & deflate" operation for ZipArchive is more difficult to extend
> since it currently relies completely on java.util.zip.* . The other reason
> is, that ZIP archives do not contain any hint on the character encoding used
> for file names etc. It seems that the usual tools simply use Cp437 and Java
> simply uses UTF-8 -- without any declaration of reasons. Thus a deflater has
> to try.
> For TarArchive the problem is unclear. Here the commons-compress
> implementation does not rely on third-party code as far as I can see, and TAR
> is no Java-bound file type (like JAR, which is Java-bound). Thus chances are,
> that everything works well, even when entry names with non-ASCII characters
> come into play.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.