Am 28.04.2011 23:28, schrieb Xueming Shen:
On 04/28/2011 01:55 PM, Ulf Zibis wrote:
Am 28.04.2011 21:56, schrieb Xueming Shen:
That said, you do have the point, we should do better even in
malformed case, ...
Yes, that's what I wanted to point on.
But I thought, you could go 1 step further, declaring bb as member of UTF_8.Decoder. Then it should be guaranteed, the a decoder is in use of only one thread at same time. Don't know if that is the case for the typical use cases?

Why do you want to "re-use" a ByteBuffer object cross decode(byte[]...) 
invocations?
I don't see any benefit of doing that.
Thinking again, I see my error. It's not re-usable, because it's size is always different, so question about the benefit seems obsolete. The benefit could have been: If the strings are kinda short, AND malformed case is kinda frequent, newly instantiations of ByteBuffers could decrease the overall performance in some percentage.


In http://cr.openjdk.java.net/~mduigou/4884238/2/webrev/ I've seen the change to use a constant Charset object instead of a constant charset name on some method calls. From your benchmark it seems, using constant charset names has some little performance gain (0..25 %) , so I don't see the benefit of the changes from 4884238 in contrary direction.


That is a totally different topic:-)

Yes, you don't benefit from using a "Charset object"  when do 
String.getBytes()/toCharArray()
because of our caching optimization in StringCoding class. But that is a pure 
implementation
detail.
I think, this fact should be mentioned in the javadoc of String.getBytes() etc. I guess, standard programmer would estimate the StandardCharset.UTF_8 version faster than the csn version.

It's safe to say that java.nio.cs.StandardCharset is not for 
String.getBytes()/toCharArray()
only, so the fact that "cs" variant of String.getBytes()/toCharArray() is "slower" than 
its "csn"
variant arguably might not be a very strong/supportive material for that 
discussion:-)
So what prevents us from the same caching optimization in ZipCoder etc. class ?


- ZipCoder.isutf8 is unreadeable. Better: isUTF8

- ArrayDecoder.decode(ba, 0, length, ca) could throw MalformedInput/UnmappableCharacterException instead returning -1. Benefits:
-- prevent from translating -1 to IllegalArgumentException("MALFORMED") in 
ZipCoder etc.
-- more precise exception


-Ulf

Reply via email to