Here is text from the j2se1.4.2 spec
A charset name must begin with either a letter or a digit. The empty
string is not a legal charset name. Charset names are not
case-sensitive; that is, case is always ignored when comparing charset
names. Charset names generally follow the conventions documented in
/RFC 2278: IANA Charset Registration Procedures/
<http://ietf.org/rfc/rfc2278.txt>.
According to RFC - 2278
Finally, charsets being registered for use with the "text" media type
MUST have a primary name that conforms to the more restrictive syntax
of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
MIME extended parameter values [RFC-2184]. A combined ABNF definition
for such names is as follows:
mime-charset = 1*<Any CHAR except SPACE, CTLs, and cspecials>
cspecials = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
<"> / "/" / "[" / "]" / "?" / "." / "=" / "*"
CHAR = <any ASCII character> ; ( 0-177, 0.-127.)
SPACE = <ASCII SP, space> ; ( 40, 32.)
CTL = <any ASCII control ; ( 0- 37, 0.- 31.)
character and DEL> ; ( 177, 127.)
If I have interpreted the above correctly, then it basically means that
the name can start with any ASCII character except ASCII (octal) 40,
0-37, 177.
A "-" is 055 and an "_" is 137 which does not fall under the above
exclude list.
So primarily if I have a charset named "-UTF-8" or "_UTF-8", it is not
an illegal name.
So looks like the spec definition is further tightening the Charsets
accepted by java in that the name can only start with a letter or a
digit. How do we interpret *must* ?
So
Richard Liang wrote:
Hello Tim,
I'm wondering why I did not just copy the first sentence. :-)
"A charset name **must** begin with either a letter or a digit." Does
this mean if the charset name which begin with neither a letter nor a
digit should be regarded as an illegal charset name?
Richard Liang
China Software Development Lab, IBM
Tim Ellison wrote:
Richard Liang wrote:
Hello Tim,
I think this is caused by different understanding of the java spec:
A charset name **must** begin with either a letter or a digit. The
empty
string is not a legal charset name....
What do think the implication of "must" here? :-)
But the name isn't empty, it is "-UTF-8" ? I must be missing
something...
Regards,
Tim
Tim Ellison (JIRA) wrote:
[
http://issues.apache.org/jira/browse/HARMONY-68?page=comments#action_12366784
]
Tim Ellison commented on HARMONY-68:
------------------------------------
The test looks invalid to me. You shoud only expect an
java.nio.charset.IllegalCharsetNameException if the name itself
contains disallowed characters, and both underscore and dash are
permitted.
The code Charset.isSupported("-UTF-8")
should return false, not throw an exception.
java.nio.charset.Charset.isSupported(String charsetName) does not
throw IllegalCharsetNameException for spoiled standard sharset name
-------------------------------------------------------------------------------------------------------------------------------------
Key: HARMONY-68
URL: http://issues.apache.org/jira/browse/HARMONY-68
Project: Harmony
Type: Bug
Components: Classlib
Reporter: Svetlana Samoilenko
Attachments: charset_patch.txt
According to j2se 1.4.2 specification for Charset.isSupported(String
charsetName) the method must throw IllegalCharsetNameException "if
the given charset name is illegal ". "Legal charset name must begin
with either a letter or a digit. The test listed below shows that
there is no the exception if to insert "-" or "_" symbols before
standard sharset name, for example "-UTF-8" or "_US-ASCII".
Moreover the method returns "true" in this case.
BEA also does not throw the exception but returns "false".
Code to reproduce: import java.nio.charset.*; public class test2
{ public static void main (String[] args) {
// string starts neither a letter nor a digit boolean
sup=false; try{
sup=Charset.isSupported("-UTF-8");
System.out.println("***BAD. should be exception;
sup="+sup); sup=Charset.isSupported("_US-ASCII");
System.out.println("***BAD. should be exception;
sup="+sup); } catch (IllegalCharsetNameException e) {
System.out.println("***OK. Expected
IllegalCharsetNameException " + e); } } } Steps to
Reproduce: 1. Build Harmony (check-out on 2006-01-30) j2se subset as
described in README.txt. 2. Compile test2.java using BEA 1.4
javac
javac -d . test2.java
3. Run java using compatible VM (J9)
java -showversion test2
Output: C:\tmp>C:\jrockit-j2sdk1.4.2_04\bin\java.exe -showversion
test2 java version "1.4.2_04" Java(TM) 2 Runtime Environment,
Standard Edition (build 1.4.2_04-b05) BEA WebLogic JRockit(TM)
1.4.2_04 JVM (build ari-31788-20040616-1132-win-ia32, Native Threads,
GC strategy: parallel) ***BAD. should be exception; sup=false
***BAD. should be exception; sup=false
C:\tmp>C:\harmony\trunk\deploy\jre\bin\java -showversion test2 (c)
Copyright 1991, 2005 The Apache Software Foundation or its licensors,
as applicable. ***BAD. should be exception; sup=true
***BAD. should be exception; sup=true
Suggested junit test case:
------------------------ CharserTest.java
------------------------------------------------- import
java.nio.charset.*; import junit.framework.*; public class
CharsetTest extends TestCase { public static void main(String[]
args) { junit.textui.TestRunner.run(CharsetTest.class); }
public void test_isSupported() { boolean
sup=false; // string starts neither a letter nor a
digit try{
sup=Charset.isSupported("-UTF-8");
fail("***BAD. should be exception
IllegalCharsetNameException"); } catch
(IllegalCharsetNameException e) { //expected
}
// string starts neither a letter nor a digit try{
sup=Charset.isSupported("_US-ASCII");
fail("***BAD. should be exception
IllegalCharsetNameException"); } catch
(IllegalCharsetNameException e) { //expected
}
} }
--
Karan Singh