Am 28.04.2011 23:28, schrieb Xueming Shen:
On 04/28/2011 01:55 PM, Ulf Zibis wrote:
Am 28.04.2011 21:56, schrieb Xueming Shen:
That said, you do have the point, we should do better even in
malformed case, ...
Yes, that's what I wanted to point on.
But I thought, you could go 1 step further, declaring bb as member of UTF_8.Decoder. Then it
should be guaranteed, the a decoder is in use of only one thread at same time. Don't know if that
is the case for the typical use cases?
Why do you want to "re-use" a ByteBuffer object cross decode(byte[]...)
invocations?
I don't see any benefit of doing that.
Thinking again, I see my error. It's not re-usable, because it's size is always different, so
question about the benefit seems obsolete. The benefit could have been: If the strings are kinda
short, AND malformed case is kinda frequent, newly instantiations of ByteBuffers could decrease the
overall performance in some percentage.
In http://cr.openjdk.java.net/~mduigou/4884238/2/webrev/ I've seen the change to use a constant
Charset object instead of a constant charset name on some method calls. From your benchmark it
seems, using constant charset names has some little performance gain (0..25 %) , so I don't see
the benefit of the changes from 4884238 in contrary direction.
That is a totally different topic:-)
Yes, you don't benefit from using a "Charset object" when do
String.getBytes()/toCharArray()
because of our caching optimization in StringCoding class. But that is a pure
implementation
detail.
I think, this fact should be mentioned in the javadoc of String.getBytes() etc. I guess, standard
programmer would estimate the StandardCharset.UTF_8 version faster than the csn version.
It's safe to say that java.nio.cs.StandardCharset is not for
String.getBytes()/toCharArray()
only, so the fact that "cs" variant of String.getBytes()/toCharArray() is "slower" than
its "csn"
variant arguably might not be a very strong/supportive material for that
discussion:-)
So what prevents us from the same caching optimization in ZipCoder etc. class ?
- ZipCoder.isutf8 is unreadeable. Better: isUTF8
- ArrayDecoder.decode(ba, 0, length, ca) could throw MalformedInput/UnmappableCharacterException
instead returning -1. Benefits:
-- prevent from translating -1 to IllegalArgumentException("MALFORMED") in
ZipCoder etc.
-- more precise exception
-Ulf