Re: Constants for standard charsets -- CR #4884238

Mike Duigou Tue, 12 Apr 2011 10:39:38 -0700

On Apr 12 2011, at 03:33 , Alan Bateman wrote:

> Alan Bateman wrote:
>> I see your mail in the archives:
>> 
>> http://mail.openjdk.java.net/pipermail/core-libs-dev/2011-April/006487.html 
>> 
>> but I didn't receive it. I had a similar issue yesterday on another list but 
>> I've no idea where the problem is.
>> 
>> -Alan
>> 
> Just a couple of initial comments on the webrev:
> 
> 1. In the standard charsets section of the class description then it might be 
> useful to include a reference to Charsets, maybe "The {@link Charsets} class 
> defines constants for each of the standard charsets".


OK
> 
> 2. @see Charsets.DEFAULT, I assume this should be @see 
> Charsets#DEFAULT_CHARSET

Correct. I changed it to DEFAULT_CHARSET and forgot to fix this link.

> 
> 3. Looks like Charsets is using 2 rather than 4-space indenting.

Ooops, I will correct this.
> 
> 
> 4. It would be nice to update java.nio.file.Path's class description to 
> replace Charset.forName("UTF-8") with Charsets.UTF_8;

I will do so.

> I was thinking more about DEFAULT_CHARSET and I'm not sure that we really 
> need it. In the java.io package then all constructors that take a Charset 
> also have a constructor that uses the default charset, same thing in 
> java.lang.String and java.util.zip package. In javax.tools.JavaCompiler I see 
> that null can be used to select the default charset. In java.nio.file.Files 
> then we didn't include versions of readAllLines, newBufferedReader, etc. that 
> didn't take a Charset parameter.

I agree that requiring an explicit Charset is best because it makes it clear 
what charset is being used. For me this argues though for the DEFAULT_CHARSET 
declaration because it's best to be obvious that the default charset is being 
used. 

I always interpret content being accessed with the default charset in one of 
two ways; 
- Content that's known to be private that the jvm wrote itself. Useful for 
caches because it's assumed that the default charset is the most efficient for 
that platform & configuration.
- Content that's potentially uninterpretable because it has an unknown charset 
and the default charset is the fallback choice. In recent times I've considered 
switching to using UTF-8 for unknown content.

Charset.getDefaultCharset() is possibly just as clear. I personally would use 
the constant and use only Charsets constants for accessing content. 

> They can be added if needed but there is an argument that you really need to 
> know the charset when accessing a text file as it can be too fragile to 
> assume the default encoding (esp. with files that are shared between users, 
> applications,  or machines).

I wouldn't add them. Default charset content should never be shared between 
instances (though it frequently is).

When I have used the default charset it's usually been in mime type 
declarations for content encoded using the default charset. An example from 
JXTA:

private static final MimeMediaType DEFAULT_TEXT_ENCODING = new 
MimeMediaType(MimeMediaType.TEXT_DEFAULTENCODING, "charset=\"" + 
Charset.defaultCharset().name() + "\"", true)

My goal in adding a DEFAULT_CHARSET constant was to make use of the default 
charset more explicit. I definitely don't want to do anything which encourages 
inappropriate use of the default charset.



Mike

Re: Constants for standard charsets -- CR #4884238

Reply via email to