Shenxue Zhou wrote:
Try to construct an InputStreamReader obj from your InputStream obj:
InputStreamReader reader = new InputStreamReader(is, encoding);

Thanks for responding Shenxue.

I'm not having a problem with encoding characters, I'm having a problem where XmlBeans is creating an incorrect prologue - or failing to persist at all. Perhaps the subject was misleading - I was talking about the character encoding definition in the XML prologue.

And construct an OutputStreamWriter from your ByteArrayOutputStream:
OutputStreamWriter writer = new OutputStreamWriter(baos, encoding);

Again, this isn't a character encoding issue - it's strictly a prologue issue.

Then do you reading, writing stuff

If you end up creating a new XmlObject for the desired encoding, try to
set the encoding property on the new XmlObject:
targetDoc.documentProperties().setEncoding(encoding);

I had already done this (shown in the previous email pasted below).

The bug (as I understand it) is that XmlBeans is only doing a one-way mapping of IANA character encodings. For example, I receive an XmlBean XML Document that starts like this:

<?xml version="1.0" encoding="UNICODEBIG"?>
...

XmlBeans handles this properly - it converts the IANA 'UNICODEBIG' character encoding into the corresponding UTF-16BE and then decodes the data into a Java String/XmlBean.

All is good incoming.

I MUST respond with the same character encoding as what the client sent me. This is impossible because XmlBeans is not capable of setting the prologue character encoding to UNICODEBIG.

At this stage, when I serialize the response Document, I set its responseDoc.documentProperties().setEncoding("UNICODEBIG"); I do this because this is what the client sent me. I can only assume clients will send me IANA character encodings, so I must respond in kind - a C# client would be confused if it saw a Java-specific character encoding name...

The code, text, and exception in the previous email (below) all show that this final step is broken - XmlBeans during serialization needs to do this:

0. create a prologue with the given IANA charset. The challenge here is for XmlBeans to provide an API that accepts an IANA charset name or a Java charset name (and then convert the Java charset name to an IANA name in the prologue)

1. translate the IANA charset encoding name to the Java equivalent and encode the document using this charset.


The bug is that the handling of the XmlBean.documentProperties() character encoding is just plain handled wrong. Perhaps the unit tests only test the UTF-8 case?

It would be great if someone responded with the next steps. I may have some time on Tuesday to look into a fix for this. It may change the default behaviour for folks not using UTF-8.

Thank you for reading.

Thoughts?


-----Original Message-----
From: Mark Swanson [mailto:[EMAIL PROTECTED] Sent: Thursday, September 21, 2006 3:58 PM
To: dev@xmlbeans.apache.org
Subject: Bug: XmlBeans does not support character encodings other than
UTF-8

Hello,

I'm trying to get XmlBeans to persist using a specific character set: UNICODEBIG.

  XmlDocumentProperties xmlDocumentProperties =
doc.documentProperties();
     String encoding = xmlDocumentProperties.getEncoding();
     Logger.info("response encoding:" + encoding);

(encoding is UNICODEBIG)

XmlOptions xmlOptions = new XmlOptions();
      xmlOptions.setSavePrettyPrint();
      xmlOptions.setSavePrettyPrintIndent(4);
      xmlOptions.setSaveOuter();
ByteArrayOutputStream baos = new ByteArrayOutputStream(2048);
     InputStream is = doc.newInputStream(xmlOptions);
     byte[] buffer = new byte[2048];
     int length = 0;
     while (true) {
       length = is.read(buffer, 0, 2048);
       if (length < 0)
         break;
       baos.write(buffer, 0, length);
     }

I need to state the prologue, and XmlBeans is printing a prologue with the wrong character set: UTF-8.

<?xml version="1.0" encoding="UTF-8"?>
...


The document has the UNICODEBIG encoding. Yet, it prints out as UTF-8. This is contrary to the javadocs which says newInputStream() takes the encoding into account.

I tried xmlOptions.setCharacterEncoding() but that fails with:

java.lang.RuntimeException: java.io.UnsupportedEncodingException:
UNICODEBIG
at org.apache.xmlbeans.impl.store.Saver$InputStreamSaver.<init>(Saver.java:
1785)
at org.apache.xmlbeans.impl.store.Cursor._newInputStream(Cursor.java:552) at org.apache.xmlbeans.impl.store.Cursor.newInputStream(Cursor.java:2442) at org.apache.xmlbeans.impl.values.XmlObjectBase.newInputStream(XmlObjectBa
se.java:156)

I'm using XmlBeans 2.1.0. Does anyone have any ideas as to why the Document's encoding properties are being ignored?

Thank you.



--
Free replacement for Exchange and Outlook (Contacts and Calendar)
http://www.ScheduleWorld.com/tg/
WebDAV: http://www.ScheduleWorld.com/sw/webDAVDir/4000.ics
VFREEBUSY: http://www.ScheduleWorld.com/sw/freebusy/4000.ifb

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to