Here is text from the j2se1.4.2 spec
A charset name must begin with either a letter or a digit. The empty string is not a legal charset name. Charset names are not case-sensitive; that is, case is always ignored when comparing charset names. Charset names generally follow the conventions documented in /RFC 2278: IANA Charset Registration Procedures/ <http://ietf.org/rfc/rfc2278.txt>.
According to RFC - 2278

  Finally, charsets being registered for use with the "text" media type
  MUST have a primary name that conforms to the more restrictive syntax
  of the charset field in MIME encoded-words [RFC-2047, RFC-2184] and
  MIME extended parameter values [RFC-2184]. A combined ABNF definition
  for such names is as follows:

  mime-charset = 1*<Any CHAR except SPACE, CTLs, and cspecials>

  cspecials    = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
                 <"> / "/" / "[" / "]" / "?" / "." / "=" / "*"

  CHAR         =  <any ASCII character>        ; (  0-177,  0.-127.)
  SPACE        =  <ASCII SP, space>            ; (     40,      32.)
  CTL          =  <any ASCII control           ; (  0- 37,  0.- 31.)
                   character and DEL>          ; (    177,     127.)

If I have interpreted the above correctly, then it basically means that the name can start with any ASCII character except ASCII (octal) 40, 0-37, 177. A "-" is 055 and an "_" is 137 which does not fall under the above exclude list. So primarily if I have a charset named "-UTF-8" or "_UTF-8", it is not an illegal name.

So looks like the spec definition is further tightening the Charsets accepted by java in that the name can only start with a letter or a digit. How do we interpret *must* ?



So

Richard Liang wrote:

Hello Tim,

I'm wondering why I did not just copy the first sentence. :-)

"A charset name **must** begin with either a letter or a digit." Does this mean if the charset name which begin with neither a letter nor a digit should be regarded as an illegal charset name?


Richard Liang
China Software Development Lab, IBM



Tim Ellison wrote:

Richard Liang wrote:
Hello Tim,

I think this is caused by different understanding of the java spec:

A charset name **must** begin with either a letter or a digit. The empty
string is not a legal charset name....

What do think the implication of "must" here? :-)


But the name isn't empty, it is "-UTF-8" ? I must be missing something...

Regards,
Tim


Tim Ellison (JIRA) wrote:
    [
http://issues.apache.org/jira/browse/HARMONY-68?page=comments#action_12366784
]
Tim Ellison commented on HARMONY-68:
------------------------------------

The test looks invalid to me.  You shoud only expect an
java.nio.charset.IllegalCharsetNameException if the name itself
contains disallowed characters, and both underscore and dash are
permitted.

The code     Charset.isSupported("-UTF-8")

should return false, not throw an exception.

java.nio.charset.Charset.isSupported(String charsetName) does not
throw IllegalCharsetNameException for spoiled standard sharset name
-------------------------------------------------------------------------------------------------------------------------------------


         Key: HARMONY-68
         URL: http://issues.apache.org/jira/browse/HARMONY-68
     Project: Harmony
        Type: Bug
  Components: Classlib
    Reporter: Svetlana Samoilenko
 Attachments: charset_patch.txt

According to j2se 1.4.2 specification for Charset.isSupported(String
charsetName)  the method must throw IllegalCharsetNameException  "if
the given charset name is illegal ". "Legal charset name must begin
with either a letter or a digit. The test listed below shows that
there is no the exception  if to insert "-" or "_" symbols before
standard sharset name, for example "-UTF-8" or "_US-ASCII".
Moreover the method returns "true" in this case.
BEA also does not throw the exception but returns "false".
Code to reproduce: import java.nio.charset.*; public class test2 { public static void main (String[] args) {
        // string starts neither a letter nor a digit         boolean
sup=false;         try{
             sup=Charset.isSupported("-UTF-8");
             System.out.println("***BAD. should be exception;
sup="+sup);              sup=Charset.isSupported("_US-ASCII");
             System.out.println("***BAD. should be exception;
sup="+sup); } catch (IllegalCharsetNameException e) { System.out.println("***OK. Expected
IllegalCharsetNameException " + e);         }           } } Steps to
Reproduce: 1. Build Harmony (check-out on 2006-01-30) j2se subset as
described in README.txt. 2. Compile test2.java using BEA 1.4 javac
javac -d . test2.java

3. Run java using compatible VM (J9)
java -showversion test2

Output: C:\tmp>C:\jrockit-j2sdk1.4.2_04\bin\java.exe -showversion
test2 java version "1.4.2_04" Java(TM) 2 Runtime Environment,
Standard Edition (build 1.4.2_04-b05) BEA WebLogic JRockit(TM)
1.4.2_04 JVM (build ari-31788-20040616-1132-win-ia32, Native Threads,
GC strategy: parallel) ***BAD. should be exception; sup=false
***BAD. should be exception; sup=false
C:\tmp>C:\harmony\trunk\deploy\jre\bin\java -showversion test2 (c)
Copyright 1991, 2005 The Apache Software Foundation or its licensors,
as applicable. ***BAD. should be exception; sup=true
***BAD. should be exception; sup=true
Suggested junit test case:
------------------------ CharserTest.java
------------------------------------------------- import
java.nio.charset.*; import junit.framework.*; public class
CharsetTest extends TestCase {     public static void main(String[]
args) {         junit.textui.TestRunner.run(CharsetTest.class);     }
public void test_isSupported() { boolean sup=false; // string starts neither a letter nor a digit try{
            sup=Charset.isSupported("-UTF-8");
            fail("***BAD. should be exception
IllegalCharsetNameException");         } catch
(IllegalCharsetNameException e) {  //expected
        }
        // string starts neither a letter nor a digit         try{
             sup=Charset.isSupported("_US-ASCII");
             fail("***BAD. should be exception
IllegalCharsetNameException");          } catch
(IllegalCharsetNameException e) {  //expected
        }
   } }






--
Karan Singh

Reply via email to