toCharArray()

Xueming Shen Mon, 02 May 2011 10:34:52 -0700

 On 5/2/2011 7:31 AM, Alan Bateman wrote:

Xueming Shen wrote:
 Hi
This is motivated by Neil's request to optimize common-case UTF8 pathfor native ZipFile.getEntry calls [1].As I said in my replying email [2] I believe a better approach mightbe to "patch" UTF8 charset directly toimplement sun.nio.cs.ArrayDecoder/Encoder interface to speed up thecoding operation for array basedencoding/decoding under certain circumstance, as we did for allsingle byte charsets in #6636323 [3]. I
have a old blog [4] that has some data for this optimization.
The original plan was to do the same thing for our new UTF8 [5] aswell in JDK7, but then (excuse, excuse)I was just too busy to come back to this topic till 2 days ago. Aftertwo days of small tweaking here and thereand testing those possible corner cases I can think of, I'm happywith the result and think it might beworth sending it out for a codereview for JDK7, knowing we only havecouple days left.
The webrev is at

http://cr.openjdk.java.net/~sherman/7040220/webrev
I went through the changes and the approach looks good to me - thanksfor jumping on this. Thanks Ulf for helping review. Also thanks Neilfor reporting this and testing with Sherman's change to verify that itaddresses the performance regression.
Sherman - just a couple of minor comments:


Thanks Alan!

Webrev has been updated accordingly.

I renamed the getBBuffer to getByteBuffer, now it "looks" better:-)

Thanks,
-Sherman

It would be good to put a blank line after the ASCII-only loops sothat future maintainers can easily distinguish the loops (this wouldmake it consistent with decodeArrayLoop for example).
In several places the test is "if (CodingErrorAction.REPLACE !=malformedInputAction())". Personally I would swap this to "if(malformedInputAction() != CodingErrorAction.REPLACE)". This reminds,if a couple of these slipped through into the zip code recently, eg:"if (false == inf.ended())", "if (false == streams.isEmpty())".
The caching of the ByteBuffer and getBBuffer for the malformed caseisn't as nice as the original code. Minimally I would move thedeclaration of bb so that it's not in the middle of dl and dp, andrename getBBuffer. Alternatively I would just get rid of it as itshouldn't be performance critical.
In TestStringCodeUTF8 it might be cleaner to put the body of main intoits own method and then call it once, set the security manager, andthen call it again.
That's all I have,

-Alan.

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

Reply via email to