Ok, if I have a text string with XML, is it possible to
encode it in UTF-8 first before passing it on to the parser routine? How can
that be done?
Thanks,
Marina
-----Original Message-----
From: Jesse Pelton [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 9:33 AM
To: [email protected]
Subject: RE: XML decoding question
From: Jesse Pelton [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 9:33 AM
To: [email protected]
Subject: RE: XML decoding question
I think the answer to your question is "no." In the world
of XML, all documents are represented in some encoding. In fact, this is true of
all text that is represented as a sequence of bits. People in the US have just
gotten used to thinking of US-ASCII as being "plain text," but people whose
native languages include characters that are not representable in US-ASCII see
it differently.
In order to do anything useful, an XML processor has to
represent the characters in a document in some way that it can understand (so it
can find the characters that surround tags and attributes, etc). To comply with
the DOM spec, a processor must encode DOMStrings as UTF-16. Since implementation
is simpler (and therefore more reliable) if all characters are treated the same,
it makes sense to represent all text (internally) as UTF-16.
So the bottom line is, to do any useful work, an XML
processor MUST successfully transform the sequence of bits that make up a
document from the document encoding to an encoding that the processor
understands. In the case of Xerces, the target encoding is
UTF-16.
The obvious next question is what you're trying to
accomplish.
As for the signature line, perhaps if enough people point
out to pointy-haired-bosses that you can't get your work done if people pay
attention to it, and it makes the company look silly to boot, they'd get the
message. Maybe it's tilting at windmills, but then again, maybe the squeaky
wheel will get the grease.
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 18, 2005 9:11 AM
To: [email protected]
Subject: RE: XML decoding questionXML document has encoding clause like encoding="utf-8", etc. It is possible to overwrite this in the code by calling setEncoding() method on the InputSource in the parser. Is it possible not to do any decoding and just treat the file as plain text?Sorry, I can't do anything about the text that is automatically added to the email... Corporate policy of Verizon Wireless.
___________________________________________________________________ The information contained in this message and any attachment may be proprietary, confidential, and privileged or subject to the work product doctrine and thus protected from disclosure. If the reader of this message is not the intended recipient, or an employee or agent responsible for delivering this message to the intended recipient, you are hereby notified that any dissemination, distribution or copying of this communication is strictly prohibited. If you have received this communication in error, please notify me immediately by replying to this message and deleting it and all copies and backups thereof. Thank you.
