Re: Invalid UTF-8 character encoding in SOAP response

Andreas Veithen Mon, 09 Jun 2008 13:59:28 -0700

Aman,

D869 DE1A is actually the surrogate pair for the character with codepoint 2A61A, which is encoded as F0AA989A in UTF-8 (see http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi). The two other character references (&#xD858;&#xDF4C;) correspond toanother character. I'm not an expert, but the XML specs don't mentionsurrogate pairs and I think that the correct way of encoding thecharacter as a character reference should be 𪘚 in this case.This definitely looks like a bug in the XML parser. I would try toreplace the XML parser by a new version of the same parser or byanother parser. I'm not familiar with Axis 1, so I don't know whatkind of parser (SAX or StAX) it uses. Maybe somebody else on the listcan give a hint?


Andreas


On 9 juin 08, at 22:18, Amandeep Singh wrote:

Hi All,
I am using axis 1.3. If the response contains a CJK character inUTF-8, axis converts it into an xml entity. On the receiver side,xml parsing fails saying that it is an invalid xml entity.
The character used has UTF-8 value F0AA989A. And axis converts itinto &#xD869;&#xDE1A;&#xD858;&#xDF4C;. And parser fails at firstentity.
Any ideas/hints would be greatly appreciated?

Thanks,
Aman



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Invalid UTF-8 character encoding in SOAP response

Reply via email to