The constructors for the Xerces XMLFormatter object all take an UnRepFlags 
argument that allows you to specify how to handle unrepresentable characters.  
So does XMLFormatter::formatBuf().  It appears that the transcoder gets to 
decide what character to replace unrepresentable characters with.

Hope that helps.

-----Original Message-----
From: Jan Suchý [mailto:zu...@post.cz] 
Sent: Monday, December 15, 2008 4:25 AM
To: c-users@xerces.apache.org
Subject: xerces/ICU unicode alias for weak encoding when serializing/converting 
to CP

Hello all,
I need to obtain output XML in iso-8859-2 encoding.
I am using UTF-8 as input encoding.
There is some character, in UTF-8 xml, which is not representable in iso-8859-2.
I am using ICU 3.8, xerces 2.8 and Xqilla svn 702.

After serializing XML to iso-8859-2 the problematic character is serialized by 
ICU/xerces/xq to:

–

The problem is, that if I will send message in iso-8859-2 with character 
– inside to Oracle DB, the Oracle parser 

does not like this character and this error is obtained:

ORA-31011: XML parsing failed, LPX-00217: invalid character 8211 (U+2013)

So, what I am looking for is some method, how to say to the ICU or to Xerces or 
to XQ, that the Unicode character, must 

not be included in result and must be for example replaced by character "?", to 
avoid Oracle parser to process it.

I would like to find clear solution, like saying to ICU not calling callback 
function or define own alias or behavior on 

this situation. Is it possible?
Any ideas?
Thank you
Jan Suchy

Reply via email to