Mark Swanson wrote:
Where are you getting your list from? On 1.5.0_07, at least on Linux,
UnicodeBig certainly does seem to be defined:
BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer ([EMAIL PROTECTED])
bsh % print(System.getProperty("java.version"));
1.5.0_07
bsh % b = "TEST".getBytes("UnicodeBig");
bsh % print(b.length);
10
bsh % print(new String(b, "UnicodeBig"));
TEST
Neat. I just ran that under BeanShell and it worked fine (Linux,
1.5.0_08)
If you run my code below you'll see that "UnicodeBig" does not exist.
I ran your code and same result here.
charset:UTF-16BE
alias:X-UTF-16BE
alias:UnicodeBigUnmarked
alias:UTF_16BE
alias:ISO-10646-UCS-2
XmlBeans _only_ defines UNICODEBIG to correspond to UTF-16BE. So it
still seems impossible to support UTF-16BE (IANA ISO-10646-UCS-2).
As I said above, UNICODEBIG is a java alias for UTF-16BE, so why
shouldn't XmlBeans define this mapping?
It's not (see above).
NOTE: I initially tried (and would prefer) to use ISO-10646-UCS-2 as
this is identical in IANA and Java. It does not work. No IANA/Java
translation is required and XmlBeans still gets it wrong.
Why not just use UTF-16BE, which is the canonical IANA name for this
character set? Does XmlBeans still have the wrong behavior, for
either incoming or outgoing documents, if you specify UTF-16BE as the
charset? If so, then I agree we have a bug.
Yes, I tried this and it didn't work.
Ok, then I agree, there is a problem. Was it the incoming or outgoing
or both that did not work? What was the failure mode?
I will post the code on Tuesday.
This is a bug.
I'm not an XML beans developer, but I'm not sure I agree (unless, as I
said there is a problem with specifying UTF-16BE). Though certainly
there may be, and probably are, some mappings missing for certain IANA
aliases.
Since 'UNICODEBIG' is not a Java or IANA character set name perhaps the
bug is a simple type for the characterset name. I can try fixing this
and testing it on Tuesday.
bsh % b = "TEST".getBytes("UNICODEBIG");
// Error: // Uncaught Exception: target exception : at Line: 2 : in
file: <unknown file> : .getBytes ( "UNICODEBIG" )
Interesting, so UNICODEBIG is not valid, but UnicodeBig is (as shown
above).
That is interesting. The javadocs (and code backs this up) state the
alias is not case sensitive.
Yeah, that is what I thought too... I think there may be some
undocumented charsets internal to the JVM. Here are the classes loaded
when using "UnicodeBig":
bsh % b = "TEST".getBytes("UnicodeBig");
[...]
[Loaded sun.io.CharToByteConverter from
/usr/lib/jvm/java-1.5.0-sun-1.5.0.07/jre/lib/rt.jar]
[Loaded sun.io.CharacterEncoding from shared objects file]
[Loaded sun.io.CharToByteUnicode from
/usr/lib/jvm/java-1.5.0-sun-1.5.0.07/jre/lib/rt.jar]
[Loaded sun.io.CharToByteUnicodeBig from
/usr/lib/jvm/java-1.5.0-sun-1.5.0.07/jre/lib/rt.jar]
FYI the Java charset/aliases I posted were for Java 1.5. For Java 1.4.1
they are:
charset:UTF-16BE
alias:X-UTF-16BE
alias:UTF_16BE
alias:ISO-10646-UCS-2
For Java 1.6 they are:
charset:UTF-16BE
alias:X-UTF-16BE
alias:UTF_16BE
alias:ISO-10646-UCS-2
alias:UnicodeBigUnmarked
Can I ask where you are getting this list from?
Sure - from the Java runtime itself:
public class DisplayCharsets {
public static void main(String[] args) throws Exception {
SortedMap availableCharsets =
Charset.availableCharsets();
Iterator i = availableCharsets.keySet().iterator();
for (; i.hasNext(); ) {
String charsetName = (String)i.next();
System.out.println("charset:" + charsetName);
Charset charset = Charset.forName(charsetName);
Set aliases = charset.aliases();
//System.out.println("aliases:" +
aliases.size());
Iterator j = aliases.iterator();
for (; j.hasNext(); )
System.out.println(" alias:" +
j.next());
}
}
}
Ran it via beanshell, and my results are the same as yours.
Cheers,
Raman
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]