Hi Sherman, Do you consider modifying the new ZipPath constructor you added to accept a boolean value for UTF-8 encoding?
If so you can more clearly document the behaviour and avoid duplication of the operators in ZipFileSystem e.g.: return new ZipPath(this, first, zc.isUTF8()); Paul. > On 27 May 2016, at 22:38, Xueming Shen <xueming.s...@oracle.com> wrote: > > Hi, > > Please help review the change for JDK-8061777. > > issue: https://bugs.openjdk.java.net/browse/JDK-8061777 > webrev: http://cr.openjdk.java.net/~sherman/8061777 > > Cause: ZipPath/ZipFileSystem uses byte[] as the internal underlying storage > for > entry names (for better performance, as the "name" is stored as bytes inside > the zip/jar file, it is desirable to avoid the redundant String<->byte[] > conversion, > if possible). With this design, it is natural to also work on byte[] directly > for those > "path" operations, including the "normalization", which mainly is to remove > the > redundant "/" and switch the "\" to "/". This appears to be a problem for > non-utf8 > encoded zip file (utf8 is the default encoding used to de/encode the entry > name > for the Java jar/zip APIs), especially those double-byte encodings that have > 0x5c > ('\') as one of the double-byte bytes. The 0x5c byte will be mistakenly > normalized > to '\' if we normalize on the byte[] directly. The proposed change here is to > normalize on the "String" to avoid this problem. Given the fact that Java > jar/zip > is specified to use utf-8 by default, to avoid the potential performance > risk/cost > for most of the zip/jar files (if we switch completely to the String based > operation) > the utf-8/byte[] path is still being used (as the default) when the encoding > is utf-8. > The implementation only switches to "String based" code path when the encoding > is specifically specified as "non-utf8", which should be rare. > > Thanks, > Sherman