What does the XercesNode(doc) constructor do?
-----Original Message-----
From: Anna Simbirtsev [mailto:[EMAIL PROTECTED]
Sent: Tuesday, September 16, 2008 3:32 PM
To: [email protected]
Subject: RE: Problems with xerces-c version 1.7.0 and UTF-8
I pass just plain xml string to the DOMParser.
const void * const buffer = str.c_str();
::DOMParser parser;
parser.setDoNamespaces(true);
parser.setToCreateXMLDeclTypeNode(false);
MemBufInputSource* memBufIS = new MemBufInputSource
(
(const XMLByte*)buffer
, length
, "domtools"
, false
);
try {
parser.parse(*memBufIS);
DOM_Document doc = parser.getDocument();
delete memBufIS;
if (!doc.isNull()) return new XercesNode(doc);
} catch(...) {
delete memBufIS;
};
return new XercesNode();
When I had no ICU, it was returning an empty string instead of utf-8
string. I just copy utf-8 strings from wikipedia.org and paste it right
into the code to test. After I compiled the parser with ICU, it returns
the string, but shorter. My xml has UTF-8 encoding set: <?xml
version='1.0' encoding='UTF-8'?>.
On Tue, 2008-09-16 at 15:22 -0400, Jesse Pelton wrote:
> First, that's a truly ancient version of Xerces. (Its successor was
> released over six years ago.) You might get more and better help if
you
> could use a more recent version. Note that you don't need ICU to
handle
> UTF-8.
>
> Second, you might search the list for questions relating to
transcoding.
> Odds are good that you're not transcoding to the encoding you think
you
> are, or something similar.
>
> And finally, if the search doesn't yield an answer, a brief code
sample
> and sample document (attached to your message, not pasted into the
> message body) may help diagnose the problem.
>
> -----Original Message-----
> From: Anna Simbirtsev [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, September 16, 2008 3:13 PM
> To: [email protected]
> Subject: Problems with xerces-c version 1.7.0 and UTF-8
>
> Hello,
>
> I compiled xerces-c 1.7.0 with ICU 4.0 to be able to handle UTF-8
> strings. Now the parser takes in UTF-8 string, but when it comes out
its
> truncated by a couple of characters. Can anybody help?
>
> Thank you
> Anna
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]