RE: problem with XML encoding

Jesse Pelton Tue, 09 Aug 2005 08:10:59 -0700

When I edited the document to change the encoding from UTF-8 to WINDOWS-1252, both DOMPrint and SAX2Print were able to process the file. If you run the same experiment and get the same results, this indicates a problem with your application rather than with Xerces.

If your application is overriding the document's declared encoding, note that this is risky business. Documents should correctly declare their encoding. When an application overrides the document encoding, it presumes to know more about the document than the document's author. It's sometimes necessary nonetheless, as the documentation for InputSource::setEncoding() points out, but this case does not seem to fit the pattern described there.

From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Tuesday, August 09, 2005 10:19 AM
To: [email protected]
Subject: problem with XML encoding

We are using Xerces SAX parser to parse the incoming XML. In some cases the XML is formed with characters that were copied and pasted from MS Word document. It seems that the character set should be "windows-1252" in this case.

If such an XML is parsed with "utf-8" encoding, Internet Explorer and out application give the same error message that there is an invalid character encountered. When this XML is parsed with "windows-1252" IE is able to display it properly, but our application does not. The character set in out application is set to 1252.

Why are we not able to display the characters properly? Does anybody know the solution to this?

Attached is the sample XML file, and a word document with screen shots of the problem in our application.

Thanks,

Marina

908 607 8580

RE: problem with XML encoding

Reply via email to