Re: XmlBeans does not support character encodings other than UTF-8

Mark Swanson Mon, 25 Sep 2006 09:19:50 -0700

Where are you getting your list from?  On 1.5.0_07, at least on Linux,
UnicodeBig certainly does seem to be defined:


BeanShell 2.0-0.b1.7jpp - by Pat Niemeyer ([EMAIL PROTECTED])
bsh % print(System.getProperty("java.version"));
1.5.0_07
bsh % b = "TEST".getBytes("UnicodeBig");
bsh % print(b.length);
10
bsh % print(new String(b, "UnicodeBig"));
TEST


Neat. I just ran that under BeanShell and it worked fine (Linux, 1.5.0_08)
If you run my code below you'll see that "UnicodeBig" does not exist.

charset:UTF-16BE
  alias:X-UTF-16BE
  alias:UnicodeBigUnmarked
  alias:UTF_16BE
  alias:ISO-10646-UCS-2

XmlBeans _only_ defines UNICODEBIG to correspond to UTF-16BE. So it
still seems impossible to support UTF-16BE (IANA ISO-10646-UCS-2).

As I said above, UNICODEBIG is a java alias for UTF-16BE, so why
shouldn't XmlBeans define this mapping?

It's not (see above).

NOTE: I initially tried (and would prefer) to use ISO-10646-UCS-2 as
this is identical in IANA and Java. It does not work. No IANA/Java
translation is required and XmlBeans still gets it wrong.

Why not just use UTF-16BE, which is the canonical IANA name for this
character set?  Does XmlBeans still have the wrong behavior, for
either incoming or outgoing documents, if you specify UTF-16BE as the
charset?  If so, then I agree we have a bug.

Yes, I tried this and it didn't work.


Ok, then I agree, there is a problem. Was it the incoming or outgoing
or both that did not work? What was the failure mode?


I will post the code on Tuesday.

This is a bug.

I'm not an XML beans developer, but I'm not sure I agree (unless, as I
said there is a problem with specifying UTF-16BE). Though certainly
there may be, and probably are, some mappings missing for certain IANA
aliases.

Since 'UNICODEBIG' is not a Java or IANA character set name perhaps the
bug is a simple type for the characterset name. I can try fixing this
and testing it on Tuesday.


bsh % b = "TEST".getBytes("UNICODEBIG");
// Error: // Uncaught Exception: target exception : at Line: 2 : in
file: <unknown file> : .getBytes ( "UNICODEBIG" )

Interesting, so UNICODEBIG is not valid, but UnicodeBig is (as shown
above).

That is interesting. The javadocs (and code backs this up) state thealias is not case sensitive.

FYI the Java charset/aliases I posted were for Java 1.5. For Java 1.4.1
they are:

charset:UTF-16BE
  alias:X-UTF-16BE
  alias:UTF_16BE
  alias:ISO-10646-UCS-2

For Java 1.6 they are:

charset:UTF-16BE
  alias:X-UTF-16BE
  alias:UTF_16BE
  alias:ISO-10646-UCS-2
  alias:UnicodeBigUnmarked


Can I ask where you are getting this list from?


Sure - from the Java runtime itself:

public class DisplayCharsets {

        public static void main(String[] args) throws Exception {
                SortedMap availableCharsets = Charset.availableCharsets();
                Iterator i = availableCharsets.keySet().iterator();
                for (; i.hasNext(); ) {
                        String charsetName = (String)i.next();
                        System.out.println("charset:" + charsetName);
                        Charset charset = Charset.forName(charsetName);
                        Set aliases = charset.aliases();
                        //System.out.println("aliases:" + aliases.size());
                        Iterator j = aliases.iterator();
                        for (; j.hasNext(); )
                                System.out.println("  alias:" + j.next());
                }

        }

}


--
Free replacement for Exchange and Outlook (Contacts and Calendar)
http://www.ScheduleWorld.com/tg/
WebDAV: http://www.ScheduleWorld.com/sw/webDAVDir/4000.ics
VFREEBUSY: http://www.ScheduleWorld.com/sw/freebusy/4000.ifb

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: XmlBeans does not support character encodings other than UTF-8

Reply via email to