Re: XmlBeans does not support character encodings other than UTF-8

Mark Swanson Thu, 21 Sep 2006 20:35:38 -0700

Shenxue Zhou wrote:

Try to construct an InputStreamReader obj from your InputStream obj:
InputStreamReader reader = new InputStreamReader(is, encoding);


Thanks for responding Shenxue.

I'm not having a problem with encoding characters, I'm having a problemwhere XmlBeans is creating an incorrect prologue - or failing to persistat all. Perhaps the subject was misleading - I was talking about thecharacter encoding definition in the XML prologue.

And construct an OutputStreamWriter from your ByteArrayOutputStream:
OutputStreamWriter writer = new OutputStreamWriter(baos, encoding);

Again, this isn't a character encoding issue - it's strictly a prologueissue.

Then do you reading, writing stuff

If you end up creating a new XmlObject for the desired encoding, try to
set the encoding property on the new XmlObject:
targetDoc.documentProperties().setEncoding(encoding);


I had already done this (shown in the previous email pasted below).

The bug (as I understand it) is that XmlBeans is only doing a one-waymapping of IANA character encodings. For example, I receive an XmlBeanXML Document that starts like this:


<?xml version="1.0" encoding="UNICODEBIG"?>
...

XmlBeans handles this properly - it converts the IANA 'UNICODEBIG'character encoding into the corresponding UTF-16BE and then decodes thedata into a Java String/XmlBean.


All is good incoming.

I MUST respond with the same character encoding as what the client sentme. This is impossible because XmlBeans is not capable of setting theprologue character encoding to UNICODEBIG.

At this stage, when I serialize the response Document, I set itsresponseDoc.documentProperties().setEncoding("UNICODEBIG"); I do thisbecause this is what the client sent me. I can only assume clients willsend me IANA character encodings, so I must respond in kind - a C#client would be confused if it saw a Java-specific character encodingname...

The code, text, and exception in the previous email (below) all showthat this final step is broken - XmlBeans during serialization needs todo this:

0. create a prologue with the given IANA charset. The challenge here isfor XmlBeans to provide an API that accepts an IANA charset name or aJava charset name (and then convert the Java charset name to an IANAname in the prologue)

1. translate the IANA charset encoding name to the Java equivalent andencode the document using this charset.

The bug is that the handling of the XmlBean.documentProperties()character encoding is just plain handled wrong. Perhaps the unit testsonly test the UTF-8 case?

It would be great if someone responded with the next steps. I may havesome time on Tuesday to look into a fix for this. It may change thedefault behaviour for folks not using UTF-8.


Thank you for reading.

Thoughts?

-----Original Message-----
From: Mark Swanson [mailto:[EMAIL PROTECTED]Sent: Thursday, September 21, 2006 3:58 PM
To: dev@xmlbeans.apache.org
Subject: Bug: XmlBeans does not support character encodings other than
UTF-8

Hello,
I'm trying to get XmlBeans to persist using a specific character set:UNICODEBIG.
  XmlDocumentProperties xmlDocumentProperties =
doc.documentProperties();
     String encoding = xmlDocumentProperties.getEncoding();
     Logger.info("response encoding:" + encoding);

(encoding is UNICODEBIG)

XmlOptions xmlOptions = new XmlOptions();
      xmlOptions.setSavePrettyPrint();
      xmlOptions.setSavePrettyPrintIndent(4);
      xmlOptions.setSaveOuter();
ByteArrayOutputStream baos = new ByteArrayOutputStream(2048);
     InputStream is = doc.newInputStream(xmlOptions);
     byte[] buffer = new byte[2048];
     int length = 0;
     while (true) {
       length = is.read(buffer, 0, 2048);
       if (length < 0)
         break;
       baos.write(buffer, 0, length);
     }
I need to state the prologue, and XmlBeans is printing a prologue withthe wrong character set: UTF-8.
<?xml version="1.0" encoding="UTF-8"?>
...
The document has the UNICODEBIG encoding. Yet, it prints out as UTF-8.This is contrary to the javadocs which says newInputStream() takes theencoding into account.
I tried xmlOptions.setCharacterEncoding() but that fails with:

java.lang.RuntimeException: java.io.UnsupportedEncodingException:
UNICODEBIG
atorg.apache.xmlbeans.impl.store.Saver$InputStreamSaver.<init>(Saver.java:
1785)
atorg.apache.xmlbeans.impl.store.Cursor._newInputStream(Cursor.java:552)atorg.apache.xmlbeans.impl.store.Cursor.newInputStream(Cursor.java:2442)atorg.apache.xmlbeans.impl.values.XmlObjectBase.newInputStream(XmlObjectBa
se.java:156)
I'm using XmlBeans 2.1.0. Does anyone have any ideas as to why theDocument's encoding properties are being ignored?
Thank you.



--
Free replacement for Exchange and Outlook (Contacts and Calendar)
http://www.ScheduleWorld.com/tg/
WebDAV: http://www.ScheduleWorld.com/sw/webDAVDir/4000.ics
VFREEBUSY: http://www.ScheduleWorld.com/sw/freebusy/4000.ifb

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: XmlBeans does not support character encodings other than UTF-8

Reply via email to