toCharArray()

Xueming Shen Wed, 27 Apr 2011 23:37:16 -0700

Hi

This is motivated by Neil's request to optimize common-case UTF8 pathfor native ZipFile.getEntry calls [1].As I said in my replying email [2] I believe a better approach might beto "patch" UTF8 charset directly toimplement sun.nio.cs.ArrayDecoder/Encoder interface to speed up thecoding operation for array basedencoding/decoding under certain circumstance, as we did for all singlebyte charsets in #6636323 [3]. I

have a old blog [4] that has some data for this optimization.

The original plan was to do the same thing for our new UTF8 [5] as wellin JDK7, but then (excuse, excuse)I was just too busy to come back to this topic till 2 days ago. Aftertwo days of small tweaking here and thereand testing those possible corner cases I can think of, I'm happy withthe result and think it might beworth sending it out for a codereview for JDK7, knowing we only havecouple days left.


The webrev is at

http://cr.openjdk.java.net/~sherman/7040220/webrev

Those tests are supposed to make sure the coding result from the newpaths for String.getBytes()/

toCharArray() matches the result from the existing implementation.

The performance results of running StrCodingBenchmarkUTF8 (included inwebrev) on my linux

box in -client and -server mode respectively are included at

http://cr.openjdk.java.net/~sherman/7040220/client
http://cr.openjdk.java.net/~sherman/7040220/server

The microbenchmark measures 1-byte, 2-byte, 3-byte and 4 bytes utf8 bitsseparately with different

length of data (from 12 bytes to thousands)

Thanks!
-Sherman

[1]http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006710.html[2]http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006726.html

[3] http://cr.openjdk.java.net/~sherman/6636323_6636319/webrev
[4] http://blogs.sun.com/xuemingshen/entry/faster_new_string_bytes_cs
[5] http://blogs.sun.com/xuemingshen/entry/the_big_overhaul_of_java

Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

Reply via email to