[ 
https://issues.apache.org/jira/browse/VFS-637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16057955#comment-16057955
 ] 

Bernd Eckenfels commented on VFS-637:
-------------------------------------

What do you think about StandardCharSet.ASCII or LATIN1 as default or would you 
use absent (which throws for non UTF8 marked archives?)

> Zip files with legacy encoding and special characters let VFS crash
> -------------------------------------------------------------------
>
>                 Key: VFS-637
>                 URL: https://issues.apache.org/jira/browse/VFS-637
>             Project: Commons VFS
>          Issue Type: Bug
>         Environment: Windows 10 64 Bit, Java 8
>            Reporter: Guido Schnepp
>              Labels: easyfix
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Oracle has reworked the ZipFile object with Java 7. Since then the default 
> constructor used by commons-vfs2 2.1 is more restrictive than with Java 6. 
> The ZipFile constructor has got a second parameter (Charset) now for 
> specification of the legacy charset to be used explicitly if the ZipFile 
> doesn't state its UTF-8 compliance internally. This affects all ZIP files 
> using a legacy charset for filename encoding but not using UTF-8 is it is 
> common today. This could be a ZIP file with files containing german umlauts 
> or russian characters in archived file's filenames, for example.
> To support this new parameter with (more or less) default values, the class 
> org.apache.commons.vfs2.provider.zip.ZipFileSystem has to be extended by a 
> default charset parameter, getter or setter (as you like) to forward this 
> setting to the java.util.zip.ZipFile constructor.
> Quick workaround for me was to create a new OwnZipFileProvider referring to 
> the even new OwnZipFileSystem (extending ZipFileSystem) with the following 
> modified function. Change has been highlighted:
> {{    protected ZipFile createZipFile(final File file) throws 
> FileSystemException {
>               try {
>                       return new ZipFile(file{color:red}*, 
> Charset.forName("IBM437")*{color});
>               } catch (final IOException ioe) {
>                       throw new 
> FileSystemException("vfs.provider.zip/open-zip-file.error", file, ioe);
>               }
>       }
> }}
> Presetting to charset 437 as legacy default charset seems to be a a good 
> workaround as stated in appendix D here: 
> https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT :
> "D.1 The ZIP format has historically supported only the original IBM PC 
> character encoding set, commonly referred to as IBM Code Page 437.  This 
> limits storing file name characters to only those within the original MS-DOS 
> range of values and does not properly support file names in other character 
> encodings, or  languages. [...]"



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to