What does the XercesNode(doc) constructor do? 

-----Original Message-----
From: Anna Simbirtsev [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 16, 2008 3:32 PM
To: [email protected]
Subject: RE: Problems with xerces-c version 1.7.0 and UTF-8

I pass just plain xml string to the DOMParser. 

 const void * const buffer = str.c_str();

   ::DOMParser parser;
   parser.setDoNamespaces(true);
   parser.setToCreateXMLDeclTypeNode(false);
   MemBufInputSource* memBufIS = new MemBufInputSource
     (
      (const XMLByte*)buffer
      , length
      , "domtools"
      , false
      );

   try {
      parser.parse(*memBufIS);
      DOM_Document doc = parser.getDocument();
      delete memBufIS;
      if (!doc.isNull()) return new XercesNode(doc);
   } catch(...) {
      delete memBufIS;
   };
   return new XercesNode();

When I had no ICU, it was returning an empty string instead of utf-8
string. I just copy utf-8 strings from wikipedia.org and paste it right
into the code to test. After I compiled the parser with ICU, it returns
the string, but shorter. My xml has UTF-8 encoding set: <?xml
version='1.0' encoding='UTF-8'?>.


On Tue, 2008-09-16 at 15:22 -0400, Jesse Pelton wrote:
> First, that's a truly ancient version of Xerces.  (Its successor was
> released over six years ago.)  You might get more and better help if
you
> could use a more recent version.  Note that you don't need ICU to
handle
> UTF-8.
> 
> Second, you might search the list for questions relating to
transcoding.
> Odds are good that you're not transcoding to the encoding you think
you
> are, or something similar.
> 
> And finally, if the search doesn't yield an answer, a brief code
sample
> and sample document (attached to your message, not pasted into the
> message body) may help diagnose the problem.
> 
> -----Original Message-----
> From: Anna Simbirtsev [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, September 16, 2008 3:13 PM
> To: [email protected]
> Subject: Problems with xerces-c version 1.7.0 and UTF-8
> 
> Hello,
> 
> I compiled xerces-c 1.7.0 with ICU 4.0 to be able to handle UTF-8
> strings. Now the parser takes in UTF-8 string, but when it comes out
its
> truncated by a couple of characters. Can anybody help?
> 
> Thank you
> Anna
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to