RE: Problems with xerces-c version 1.7.0 and UTF-8

Anna Simbirtsev Tue, 16 Sep 2008 12:45:24 -0700

It just stores DOM_Document and has functions like getFirstChildElement
and getNodeData.


On Tue, 2008-09-16 at 15:34 -0400, Jesse Pelton wrote:
> What does the XercesNode(doc) constructor do? 
> 
> -----Original Message-----
> From: Anna Simbirtsev [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, September 16, 2008 3:32 PM
> To: [email protected]
> Subject: RE: Problems with xerces-c version 1.7.0 and UTF-8
> 
> I pass just plain xml string to the DOMParser. 
> 
>  const void * const buffer = str.c_str();
> 
>    ::DOMParser parser;
>    parser.setDoNamespaces(true);
>    parser.setToCreateXMLDeclTypeNode(false);
>    MemBufInputSource* memBufIS = new MemBufInputSource
>      (
>       (const XMLByte*)buffer
>       , length
>       , "domtools"
>       , false
>       );
> 
>    try {
>       parser.parse(*memBufIS);
>       DOM_Document doc = parser.getDocument();
>       delete memBufIS;
>       if (!doc.isNull()) return new XercesNode(doc);
>    } catch(...) {
>       delete memBufIS;
>    };
>    return new XercesNode();
> 
> When I had no ICU, it was returning an empty string instead of utf-8
> string. I just copy utf-8 strings from wikipedia.org and paste it right
> into the code to test. After I compiled the parser with ICU, it returns
> the string, but shorter. My xml has UTF-8 encoding set: <?xml
> version='1.0' encoding='UTF-8'?>.
> 
> 
> On Tue, 2008-09-16 at 15:22 -0400, Jesse Pelton wrote:
> > First, that's a truly ancient version of Xerces.  (Its successor was
> > released over six years ago.)  You might get more and better help if
> you
> > could use a more recent version.  Note that you don't need ICU to
> handle
> > UTF-8.
> > 
> > Second, you might search the list for questions relating to
> transcoding.
> > Odds are good that you're not transcoding to the encoding you think
> you
> > are, or something similar.
> > 
> > And finally, if the search doesn't yield an answer, a brief code
> sample
> > and sample document (attached to your message, not pasted into the
> > message body) may help diagnose the problem.
> > 
> > -----Original Message-----
> > From: Anna Simbirtsev [mailto:[EMAIL PROTECTED] 
> > Sent: Tuesday, September 16, 2008 3:13 PM
> > To: [email protected]
> > Subject: Problems with xerces-c version 1.7.0 and UTF-8
> > 
> > Hello,
> > 
> > I compiled xerces-c 1.7.0 with ICU 4.0 to be able to handle UTF-8
> > strings. Now the parser takes in UTF-8 string, but when it comes out
> its
> > truncated by a couple of characters. Can anybody help?
> > 
> > Thank you
> > Anna
> > 
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Problems with xerces-c version 1.7.0 and UTF-8

Reply via email to