Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])

Xueming Shen Fri, 13 Jul 2012 10:03:55 -0700

On 07/13/2012 05:19 AM, Alan Bateman wrote:

On 11/07/2012 00:11, Xueming Shen wrote:
Hi,
In JDK7, the decoder and encoder implementation of most of oursingle-byte charsetsand UTF-8 charset are optimized to implement the internal interfcesun.nio.cs.ArrayDecoder/Encoder to provide a fastpath for String.getBytes(...) and newString(byte[]...) operations. I
have an old blog regarding this optimization at

https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs
This rfe, as the followup for above changes, is to implementArrayDe/Encoder for mostof the sun.nio.cs.ext.DoubleByte based double-byte charsets. Here isthe webrev
http://cr.openjdk.java.net/~sherman/7183053/webrev
I've taken a pass over this and it's great to seeDoubleByte.Decoder/Encoder implementingsun.nio.cs.ArrayDecoder/Encoder. The results looks good too, a smallnumber of regressions (Big5 at len=32 for example) but this is a microbenchmark and I'm sure there are fluctuations. I don't see anythingobviously wrong with the EBCDIC changes I'd need a history book toremember how the shifts between DBCS and SBCS. I think our tests ourgood for this area so I'm happy. One minor nit is the continue in bothencode methods, I think it would be cleaner to use "else if (bb ..."instead.

The continue might make the vm happy, but this is the code I did lastOct, so I might be

wrong. Will give a couple run later with "else"

I see in TestStringCoding.java that you've commented out the test thatgoes over the buffer limit - would I be correct to say that this isn'tan issue and this happens with DB charsets today?

This is also true for utf-8 I did last year, but utf-8 is excluded atthe beginning of the test. ForSB, it takes the advantage that the output char[] should always be thesame as the lengthof the input bytes, so this can be checked at the very beginningtogether. For mb, to checkboth sp and dp slow down the de/encoding (vm obviously does not like toomany "if"s). Giventhis is an internal interface used exclusively by StringCoding, in whichit has alreadycalculated the max buf to feed in, I think this is something that can beoptimized.


-Sherman

Ulf - you've got several patches to the double byte charsets and Iwonder if you have cycles to try Sherman's patch with jdk8 to see ifthere is any more to be gained?
-Alan.

Re: Codereview request for 7183053: Optimize DoubleByte charset for String.getBytes()/new String(byte[])

Reply via email to