Re: JDK 9 RFR of 8039474: sun.misc.CharacterDecoder.decodeBuffer should use getBytes(iso8859-1)

Xueming Shen Thu, 10 Apr 2014 12:14:43 -0700

On 04/10/2014 12:03 PM, Chris Hegarty wrote:

On 10 Apr 2014, at 19:50, Xueming Shen<[email protected]>  wrote:

On 04/10/2014 11:38 AM, Mike Duigou wrote:

On Apr 10 2014, at 11:08 , Chris Hegarty<[email protected]>   wrote:

On 10 Apr 2014, at 18:40, Mike Duigou<[email protected]>   wrote:

On Apr 10 2014, at 03:21 , Chris Hegarty<[email protected]>   wrote:

On 10 Apr 2014, at 11:03, Ulf Zibis<[email protected]>   wrote:

Hi Chris,

Am 10.04.2014 11:04, schrieb Chris Hegarty:

Trivially, you could ( but of not have to ) use 
java.nio.charset.StandardCharsets.ISO_8859_1 to avoid the cost of String to 
CharSet lookup.

In earlier tests Sherman and I have found out, that the cost of initialization 
of a new charsets object is higher than the lookup of an existing object in the 
cache.
And it's even better to use the same String instance for the lookup which was 
used to cache the charset.

Interesting… thanks for let me know.  Presumably, there is an assumption is 
StandardCharsets is not initialized elsewhere, by another dependency.

Generally it's safe to assume that StandardCharsets will already be 
initialized. If it isn't initialized we should consider it an amortized cost.

I'm which case why would the string version be more performant than the version 
that already takes the Charset? Doesn't the string version need to do a lookup?

There is a cache in StringCoder that is only used in the byte[] getBytes(String 
charsetName) but not in the byte[] getBytes(Charset charset) case. The 
rationale in StringCodding::decode(Charset cs, byte[] ba, int off, int len) may 
need to be revisited as it is certainly surprising that the string constant 
charset name usage is faster than the CharSet constant.

It's a surprising :-) In theory you can't cache the de/encoder of a charset from
external world, as the same charset might return a different de/encoder next
time. So it is decided to not cache the de/encoder for a coming charset back
then. It might be reasonable to cache those from the StandardCharsets though.

I would say that it is more than reasonable. ;-) And it is surprising to me too 
that this usage is not as fast as a constant string.


Charset.equals() does explicitly mention "same canonical name" as below

    /**
     * Tells whether or not this object is equal to another.
     *
     * <p> Two charsets are equal if, and only if, they have the same canonical
     * names.  A charset is never equal to any other type of object. </p>
     *
     * @return <tt>true</tt> if, and only if, this charset is equal to the
     *          given object
     */


But it is very reasonable :-) to assume someone might pass in a home-made
charset implementation with the same canonical name as the one in our/jdk
charset repository. Then we have another debate on which one should be
used in this case.

-Sherman

Re: JDK 9 RFR of 8039474: sun.misc.CharacterDecoder.decodeBuffer should use getBytes(iso8859-1)

Reply via email to