toCharArray()

Ulf Zibis Thu, 28 Apr 2011 15:47:57 -0700

Am 28.04.2011 23:28, schrieb Xueming Shen:

On 04/28/2011 01:55 PM, Ulf Zibis wrote:
Am 28.04.2011 21:56, schrieb Xueming Shen:
That said, you do have the point, we should do better even in
malformed case, ...
Yes, that's what I wanted to point on.
But I thought, you could go 1 step further, declaring bb as member of UTF_8.Decoder. Then itshould be guaranteed, the a decoder is in use of only one thread at same time. Don't know if thatis the case for the typical use cases?
Why do you want to "re-use" a ByteBuffer object cross decode(byte[]...) 
invocations?
I don't see any benefit of doing that.

Thinking again, I see my error. It's not re-usable, because it's size is always different, soquestion about the benefit seems obsolete. The benefit could have been: If the strings are kindashort, AND malformed case is kinda frequent, newly instantiations of ByteBuffers could decrease theoverall performance in some percentage.

In http://cr.openjdk.java.net/~mduigou/4884238/2/webrev/ I've seen the change to use a constantCharset object instead of a constant charset name on some method calls. From your benchmark itseems, using constant charset names has some little performance gain (0..25 %) , so I don't seethe benefit of the changes from 4884238 in contrary direction.
That is a totally different topic:-)

Yes, you don't benefit from using a "Charset object"  when do 
String.getBytes()/toCharArray()
because of our caching optimization in StringCoding class. But that is a pure 
implementation
detail.

I think, this fact should be mentioned in the javadoc of String.getBytes() etc. I guess, standardprogrammer would estimate the StandardCharset.UTF_8 version faster than the csn version.

It's safe to say that java.nio.cs.StandardCharset is not for 
String.getBytes()/toCharArray()
only, so the fact that "cs" variant of String.getBytes()/toCharArray() is "slower" than 
its "csn"
variant arguably might not be a very strong/supportive material for that 
discussion:-)

So what prevents us from the same caching optimization in ZipCoder etc. class ?


- ZipCoder.isutf8 is unreadeable. Better: isUTF8

- ArrayDecoder.decode(ba, 0, length, ca) could throw MalformedInput/UnmappableCharacterExceptioninstead returning -1. Benefits:

-- prevent from translating -1 to IllegalArgumentException("MALFORMED") in 
ZipCoder etc.
-- more precise exception


-Ulf

Re: Codereview request: CR 7040220 java/char_encodin Optimize UTF-8 charset for String.getBytes()/toCharArray()

Reply via email to