Re: StandardCharset vs. StandardCharsets

Xueming Shen Sat, 07 May 2011 11:56:05 -0700

On 05-07-2011 上午 9:00, Rémi Forax wrote:

On 05/07/2011 02:17 PM, Ulf Zibis wrote:
Hi all,
please excuse, that I have still problems to accept this additionalclass, but +1 for the plural name.
If those charset constants are there, people _will use_ them withoutrespect on the existing _performance disadvantages_.
A common typical use case should be: String.getBytes(...)
On small strings there is a performance lost up to 25 % using thecharset variant vs. the charset name variant. See:
http://cr.openjdk.java.net/~sherman/7040220/client
http://markmail.org/message/2tbas5skgkve52mz
http://markmail.org/thread/lnrozcbnpcl5kmzs
So I still think, we should have the standard charset names asconstants in class j.n.c.Charset:public static final String UTF_8 = "UTF-8"; etc...
Using objects instead of string is a better design.
I see the fact that the String method variants that takes a Charsetare slower that the ones that use a String
as a performance bug, not as a design issue.
The String method that takes a Charset should reuse the class-localdecoder
and the performance problem will go away.
(The analysis in StringCoding.decode(Charset, ...) (point 1) forgetthat initializing a decoder has also a cost)

I do know the "slowness" is from initializing cs.newDe/Encoder():-) Butit is just not "easy" to cachethe de/encoder in this case. There is no guarantee that the cs passed inthis time is the same oneyou had last time, even the name might be the same. Or even the cs thistime is indeed the sameinstance you had last time (you did the cache), there is no guaranteethe dec/enc returned fromnewDecoder()/Encoder() this time will be the same one in your cache,until you invoke thenewDecoder()/Encoder() , get the enc/dec and compare to the on in yourcache, but then why cacheit:-) Something you can do is to do the cache if the cs passed in isindeed the one from our owncharset repository (can be trusted that would not do something tricky),you can do this by invokinggetClassLoader0() == null, which is expensive, I kinda remember themeasure showed this mightnot be something worthing doing last time when I was there. Sure, thosecharsets inStandardChsets can be treated specially, if desirable, probably only theascii, iso8859-1 and utf8,

such as

if (cs == StandardCharsets.UTF_8 || cs == StandardCharsets.US_ASCII...) {
...
}

Will do some measurement later to see if to separate the "else" part ina side method will speed

up a little, we can do that if the inline does help, but not for 7:-)

Thanks!
-Sherman

Rémi
PS: also the else part of if(c instanceof ArrayDecoder) should be in aside method to ease
the inlining of decode().

Re: StandardCharset vs. StandardCharsets

Reply via email to