I'm not sure that I understand the question. "windows-1252
XMLCh*" is an oxymoron. windows-1252 is an 8-bit character encoding; XMLCh is a
16-bit encoding (UTF-16). But I'll try to answer what I think you may be driving
at.
If you have a valid XML document encoded in windows-1252
and your system understands this encoding (which Windows systems do), Xerces can
parse the document. You can then extract strings of XMLCh characters encoded in
UTF-16 from the DOM. If you want to display the string, you can use
XMLString::transcode(), which will transcode to the current local code page. (I
just noticed that the variant that transcodes to char * is deprecated, though.
You're better off creating a local page transcoder. Use
XMLPlatformUtils::fgTransService->makeNewLCPTranscoder().)
You've said that this produces "garbage." It might be helpful if you provided a
sample of the input and the output along with your current code page. I'm not
quite sure what would happen if the input included a character that is not
representable in the output encoding.
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 24, 2005 12:49 PM
To: [email protected]
Subject: RE: XML decoding questionIs it possible to convert a windows-1252 XMLCh* to char*, so that windows application could display it?-----Original Message-----
From: Jesse Pelton [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 24, 2005 12:34 PM
To: [email protected]
Subject: RE: XML decoding questionI strongly recommend getting the producer of your documents to produce valid XML. If they're encoded in windows-1252, the encoding declaration should reflect that. Then you can just pass the document to the parser and everything will be happy.I suspect you're on your way to creating an elaborate (and probably delicate) workaround for a fundamental flaw outside your code. It's not surprising that you're getting confused; you're trying to outwit a carefully designed system, and I have the impression that you're not completely comfortable with character encodings, transcoding among them, and why Xerces operates as it does. All of that ceases to be an issue if your input documents are valid. Again, these documents would be valid if they correctly declared their encoding ("WINDOWS-1252" rather than "UTF-8" in the example you sent). Alternatively, whoever produces the documents could transcode all content into UTF-8 before adding it to the documents. That's probably the most robust solution.
