hi peter

betwixt escapes characters as per the xml specification. i suspect that your problem is that the xml encoding specified is not the same as the encoding produced by your code.

i wouldn't support a default encoding of characters into entities since this is a controversial without any clear consensus about use case. i would prefer to see an optional strategy added to betwixt which would flexibly process text after it was generated but i don't have the time to code something like that myself at the moment.

if you feel like contributing a patch to do this, then it would be gratefully accepted. see http://jakarta.apache.org/commons/patches.html and http://jakarta.apache.org/site/getinvolved.html for some guidelines about how to go about this. please include test cases.

- robert

On Sunday, June 22, 2003, at 12:40 PM, Peter Nuernberg wrote:

Hi-

I have the following problem. I have some beans that have attributes that may contain certain non-ascii characters (for example, the "o with a slash" or "a-e ligature" one sees in Danish words). When these are written out to xml attributes, they are not escaped. When I try to read beans with attributes containing these characters, the parser complains (org.xml.sax.SAXParseException: Character conversion error: "Unconvertible UTF-8 character beginning with 0xf8" (line number may be too low). at org.apache.crimson.parser.InputEntity.fatal(InputEntity.java:
1100))


I've looked at the way in which attribute values are escaped, but haven't found much useful there. It seems characters such as ampersamds are escaped, but not non-ascii characters. I'm not sure this is really a betwixt bug or not - maybe the parser is incorrectly rejecting well-formed xml... I know that if I replace the character in the xml file with an escape (e.g., o slash with ø), the file is read correctly, but then it is consequently saved incorrectly (since the ampersand will now be escaped, yielding ø in the example above).

I've recently ported my system from my own home-brewed xml persistence mechanism to betwixt. In my system, I escaped all non-ascii characters with their equivalent decimal codes. I've looked through the faq and user mailing lists, but didn't see a reference to this problem. Can someone provide a hint as to how to get around this problem? Would it be preferrable to simply escape non-ascii characters? (If so, I could certainly provide a first-pass at the code for that.)

Thanks for any help.

-Peter



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to