converting to CP

Jan Suchý Tue, 16 Dec 2008 00:29:36 -0800

Hello Jesse,
thank you for your answer :-) it seems to be promising. I'll look at it.
Jan



> ------------ Původní zpráva ------------
> Od: Jesse Pelton <j...@pkc.com>
> Předmět: RE: xerces/ICU unicode alias for weak encoding when
> serializing/converting to CP
> Datum: 15.12.2008 18:15:49
> ----------------------------------------
> The constructors for the Xerces XMLFormatter object all take an UnRepFlags
> argument that allows you to specify how to handle unrepresentable characters.
> So does XMLFormatter::formatBuf().  It appears that the transcoder gets to
> decide what character to replace unrepresentable characters with.
>
> Hope that helps.
>
> -----Original Message-----
> From: Jan Suchý [mailto:zu...@post.cz]
> Sent: Monday, December 15, 2008 4:25 AM
> To: c-users@xerces.apache.org
> Subject: xerces/ICU unicode alias for weak encoding when 
> serializing/converting
> to CP
>
> Hello all,
> I need to obtain output XML in iso-8859-2 encoding.
> I am using UTF-8 as input encoding.
> There is some character, in UTF-8 xml, which is not representable in
> iso-8859-2.
> I am using ICU 3.8, xerces 2.8 and Xqilla svn 702.
>
> After serializing XML to iso-8859-2 the problematic character is serialized by
> ICU/xerces/xq to:
>
> &#x2013;
>
> The problem is, that if I will send message in iso-8859-2 with character
> &#x2013; inside to Oracle DB, the Oracle parser
>
> does not like this character and this error is obtained:
>
> ORA-31011: XML parsing failed, LPX-00217: invalid character 8211 (U+2013)
>
> So, what I am looking for is some method, how to say to the ICU or to Xerces 
> or
> to XQ, that the Unicode character, must
>
> not be included in result and must be for example replaced by character "?", 
> to
> avoid Oracle parser to process it.
>
> I would like to find clear solution, like saying to ICU not calling callback
> function or define own alias or behavior on
>
> this situation. Is it possible?
> Any ideas?
> Thank you
> Jan Suchy
>
>
>

RE: xerces/ICU unicode alias for weak encoding when serializing/converting to CP

Reply via email to