Hi,

Please help review the change for JDK-8061777.

issue: https://bugs.openjdk.java.net/browse/JDK-8061777
webrev: http://cr.openjdk.java.net/~sherman/8061777

Cause: ZipPath/ZipFileSystem uses byte[] as the internal underlying storage for
entry names (for better performance, as the "name" is stored as bytes inside
the zip/jar file, it is desirable to avoid the redundant String<->byte[] 
conversion,
if possible). With this design, it is natural to also work on byte[] directly 
for those
"path" operations, including the "normalization", which mainly is to remove the
redundant "/" and switch the "\" to "/". This appears to be a problem for 
non-utf8
encoded zip file (utf8 is the default encoding used to de/encode the entry name
for the Java jar/zip APIs), especially those double-byte encodings that have 
0x5c
('\') as one of the double-byte bytes. The 0x5c byte will be mistakenly 
normalized
to '\' if we normalize on the byte[] directly. The proposed change here is to
normalize on the "String" to avoid this problem. Given the fact that Java 
jar/zip
is specified to use utf-8 by default, to avoid the potential performance 
risk/cost
for most of the zip/jar files (if we switch completely to the String based 
operation)
the utf-8/byte[] path is still being used (as the default) when the encoding is 
utf-8.
The implementation only switches to "String based" code path when the encoding
is specifically specified as "non-utf8", which should be rare.

Thanks,
Sherman

Reply via email to