Re: XmlBeans does not support character encodings other than UTF-8

Raman Gupta Mon, 25 Sep 2006 10:08:40 -0700

Mark Swanson wrote:

Where are you getting your list from?  On 1.5.0_07, at least on Linux,
UnicodeBig certainly does seem to be defined:

BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer ([EMAIL PROTECTED])
bsh % print(System.getProperty("java.version"));
1.5.0_07
bsh % b = "TEST".getBytes("UnicodeBig");
bsh % print(b.length);
10
bsh % print(new String(b, "UnicodeBig"));
TEST

Neat. I just ran that under BeanShell and it worked fine (Linux,1.5.0_08)

If you run my code below you'll see that "UnicodeBig" does not exist.


I ran your code and same result here.

charset:UTF-16BE
  alias:X-UTF-16BE
  alias:UnicodeBigUnmarked
  alias:UTF_16BE
  alias:ISO-10646-UCS-2

XmlBeans _only_ defines UNICODEBIG to correspond to UTF-16BE. So it
still seems impossible to support UTF-16BE (IANA ISO-10646-UCS-2).

As I said above, UNICODEBIG is a java alias for UTF-16BE, so why
shouldn't XmlBeans define this mapping?

It's not (see above).

NOTE: I initially tried (and would prefer) to use ISO-10646-UCS-2 as
this is identical in IANA and Java. It does not work. No IANA/Java
translation is required and XmlBeans still gets it wrong.

Why not just use UTF-16BE, which is the canonical IANA name for this
character set?  Does XmlBeans still have the wrong behavior, for
either incoming or outgoing documents, if you specify UTF-16BE as the
charset?  If so, then I agree we have a bug.

Yes, I tried this and it didn't work.


Ok, then I agree, there is a problem. Was it the incoming or outgoing
or both that did not work? What was the failure mode?


I will post the code on Tuesday.

This is a bug.

I'm not an XML beans developer, but I'm not sure I agree (unless, as I
said there is a problem with specifying UTF-16BE). Though certainly
there may be, and probably are, some mappings missing for certain IANA
aliases.

Since 'UNICODEBIG' is not a Java or IANA character set name perhaps the
bug is a simple type for the characterset name. I can try fixing this
and testing it on Tuesday.


bsh % b = "TEST".getBytes("UNICODEBIG");
// Error: // Uncaught Exception: target exception : at Line: 2 : in
file: <unknown file> : .getBytes ( "UNICODEBIG" )

Interesting, so UNICODEBIG is not valid, but UnicodeBig is (as shown
above).

That is interesting. The javadocs (and code backs this up) state thealias is not case sensitive.

Yeah, that is what I thought too... I think there may be someundocumented charsets internal to the JVM. Here are the classes loadedwhen using "UnicodeBig":


bsh % b = "TEST".getBytes("UnicodeBig");
[...]

[Loaded sun.io.CharToByteConverter from/usr/lib/jvm/java-1.5.0-sun-1.5.0.07/jre/lib/rt.jar]

[Loaded sun.io.CharacterEncoding from shared objects file]

[Loaded sun.io.CharToByteUnicode from/usr/lib/jvm/java-1.5.0-sun-1.5.0.07/jre/lib/rt.jar][Loaded sun.io.CharToByteUnicodeBig from/usr/lib/jvm/java-1.5.0-sun-1.5.0.07/jre/lib/rt.jar]

FYI the Java charset/aliases I posted were for Java 1.5. For Java 1.4.1
they are:

charset:UTF-16BE
  alias:X-UTF-16BE
  alias:UTF_16BE
  alias:ISO-10646-UCS-2

For Java 1.6 they are:

charset:UTF-16BE
  alias:X-UTF-16BE
  alias:UTF_16BE
  alias:ISO-10646-UCS-2
  alias:UnicodeBigUnmarked


Can I ask where you are getting this list from?


Sure - from the Java runtime itself:

public class DisplayCharsets {

        public static void main(String[] args) throws Exception {

SortedMap availableCharsets =Charset.availableCharsets();

                Iterator i = availableCharsets.keySet().iterator();
                for (; i.hasNext(); ) {
                        String charsetName = (String)i.next();
                        System.out.println("charset:" + charsetName);
                        Charset charset = Charset.forName(charsetName);
                        Set aliases = charset.aliases();

//System.out.println("aliases:" +aliases.size());

                        Iterator j = aliases.iterator();
                        for (; j.hasNext(); )

System.out.println(" alias:" +j.next());


Ran it via beanshell, and my results are the same as yours.

Cheers,
Raman



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: XmlBeans does not support character encodings other than UTF-8

Reply via email to