On 07/13/2012 05:19 AM, Alan Bateman wrote:
On 11/07/2012 00:11, Xueming Shen wrote:
Hi,
In JDK7, the decoder and encoder implementation of most of our
single-byte charsets
and UTF-8 charset are optimized to implement the internal interfce
sun.nio.cs.ArrayDecoder/
Encoder to provide a fastpath for String.getBytes(...) and new
String(byte[]...) operations. I
have an old blog regarding this optimization at
https://blogs.oracle.com/xuemingshen/entry/faster_new_string_bytes_cs
This rfe, as the followup for above changes, is to implement
ArrayDe/Encoder for most
of the sun.nio.cs.ext.DoubleByte based double-byte charsets. Here is
the webrev
http://cr.openjdk.java.net/~sherman/7183053/webrev
I've taken a pass over this and it's great to see
DoubleByte.Decoder/Encoder implementing
sun.nio.cs.ArrayDecoder/Encoder. The results looks good too, a small
number of regressions (Big5 at len=32 for example) but this is a micro
benchmark and I'm sure there are fluctuations. I don't see anything
obviously wrong with the EBCDIC changes I'd need a history book to
remember how the shifts between DBCS and SBCS. I think our tests our
good for this area so I'm happy. One minor nit is the continue in both
encode methods, I think it would be cleaner to use "else if (bb ..."
instead.
The continue might make the vm happy, but this is the code I did last
Oct, so I might be
wrong. Will give a couple run later with "else"
I see in TestStringCoding.java that you've commented out the test that
goes over the buffer limit - would I be correct to say that this isn't
an issue and this happens with DB charsets today?
This is also true for utf-8 I did last year, but utf-8 is excluded at
the beginning of the test. For
SB, it takes the advantage that the output char[] should always be the
same as the length
of the input bytes, so this can be checked at the very beginning
together. For mb, to check
both sp and dp slow down the de/encoding (vm obviously does not like too
many "if"s). Given
this is an internal interface used exclusively by StringCoding, in which
it has already
calculated the max buf to feed in, I think this is something that can be
optimized.
-Sherman
Ulf - you've got several patches to the double byte charsets and I
wonder if you have cycles to try Sherman's patch with jdk8 to see if
there is any more to be gained?
-Alan.