Guido Schnepp created VFS-637:
---------------------------------

             Summary: Zip files with legacy encoding and special characters let 
VFS crash
                 Key: VFS-637
                 URL: https://issues.apache.org/jira/browse/VFS-637
             Project: Commons VFS
          Issue Type: Bug
         Environment: Windows 10 64 Bit, Java 8
            Reporter: Guido Schnepp


Oracle has reworked the ZipFile object with Java 7. Since then the default 
constructor used by commons-vfs2 2.1 is more restrictive than with Java 6. The 
ZipFile constructor has got a second parameter (Charset) now for specification 
of the legacy charset to be used explicitly if the ZipFile doesn't state its 
UTF-8 compliance internally. This affects all ZIP files using a legacy charset 
for filename encoding but not using UTF-8 is it is common today. This could be 
a ZIP file with files containing german umlauts or russian characters in 
archived file's filenames, for example.

To support this new parameter with (more or less) default values, the class 
org.apache.commons.vfs2.provider.zip.ZipFileSystem has to be extended by a 
default charset parameter, getter or setter (as you like) to forward this 
setting to the java.util.zip.ZipFile constructor.

Quick workaround for me was to create a new OwnZipFileProvider referring to the 
even new OwnZipFileSystem (extending ZipFileSystem) with the following modified 
function. Change has been highlighted:

{{      protected ZipFile createZipFile(final File file) throws 
FileSystemException {
                try {
                        return new ZipFile(file{color:red}*, 
Charset.forName("IBM437")*{color});
                } catch (final IOException ioe) {
                        throw new 
FileSystemException("vfs.provider.zip/open-zip-file.error", file, ioe);
                }
        }
}}

Presetting to charset 437 as legacy default charset seems to be a a good 
workaround as stated in appendix D here: 
https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT :

"D.1 The ZIP format has historically supported only the original IBM PC 
character encoding set, commonly referred to as IBM Code Page 437.  This limits 
storing file name characters to only those within the original MS-DOS range of 
values and does not properly support file names in other character encodings, 
or  languages. [...]"





--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to