Hi,

I'm trying to read in all the #text nodes in a set of XML documents, but I'm running into problems when the document content includes ampersands (&) in the text.

So given a document path, I use XercesDOMParser to get the root DOMNode*. Using that node, I traverse the entire tree looking for #text nodes. Whenever I see a #text node, I getNodeValue() and do a XMLString::transcode() on it to get the char*.

This works fine until I run into a document that has & in its content. For example,
=========================
...
<TEXT>


Maryland Federal Bancorp Inc., a Hyattsville-based thrift, announced yesterday that it will be acquired by BB&T Corp. of Winston-Salem, N.C., for $ 265.3
million in stock.
...
=========================

For some reason, the char* I get back from XMLString::transcode() only gives me the text up to "BB" (in "BB&T"). If I manually delete the & from the file, it'll parse just fine. So basically, the "&" is ending the text prematurely.

I'm a total XML noob so I have no clue what to do here. I'm probably just missing something very basic. Any guidance would be greatly appreciated.

Thank you.

-Xiaolei

Reply via email to