[ https://issues.apache.org/jira/browse/VFS-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057955#comment-16057955 ]
Bernd Eckenfels commented on VFS-637: ------------------------------------- What do you think about StandardCharSet.ASCII or LATIN1 as default or would you use absent (which throws for non UTF8 marked archives?) > Zip files with legacy encoding and special characters let VFS crash > ------------------------------------------------------------------- > > Key: VFS-637 > URL: https://issues.apache.org/jira/browse/VFS-637 > Project: Commons VFS > Issue Type: Bug > Environment: Windows 10 64 Bit, Java 8 > Reporter: Guido Schnepp > Labels: easyfix > Original Estimate: 24h > Remaining Estimate: 24h > > Oracle has reworked the ZipFile object with Java 7. Since then the default > constructor used by commons-vfs2 2.1 is more restrictive than with Java 6. > The ZipFile constructor has got a second parameter (Charset) now for > specification of the legacy charset to be used explicitly if the ZipFile > doesn't state its UTF-8 compliance internally. This affects all ZIP files > using a legacy charset for filename encoding but not using UTF-8 is it is > common today. This could be a ZIP file with files containing german umlauts > or russian characters in archived file's filenames, for example. > To support this new parameter with (more or less) default values, the class > org.apache.commons.vfs2.provider.zip.ZipFileSystem has to be extended by a > default charset parameter, getter or setter (as you like) to forward this > setting to the java.util.zip.ZipFile constructor. > Quick workaround for me was to create a new OwnZipFileProvider referring to > the even new OwnZipFileSystem (extending ZipFileSystem) with the following > modified function. Change has been highlighted: > {{ protected ZipFile createZipFile(final File file) throws > FileSystemException { > try { > return new ZipFile(file{color:red}*, > Charset.forName("IBM437")*{color}); > } catch (final IOException ioe) { > throw new > FileSystemException("vfs.provider.zip/open-zip-file.error", file, ioe); > } > } > }} > Presetting to charset 437 as legacy default charset seems to be a a good > workaround as stated in appendix D here: > https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT : > "D.1 The ZIP format has historically supported only the original IBM PC > character encoding set, commonly referred to as IBM Code Page 437. This > limits storing file name characters to only those within the original MS-DOS > range of values and does not properly support file names in other character > encodings, or languages. [...]" -- This message was sent by Atlassian JIRA (v6.4.14#64029)