Hi,
I'm trying to read in all the #text nodes in a set of XML documents,
but I'm running into problems when the document content includes
ampersands (&) in the text.
So given a document path, I use XercesDOMParser to get the root
DOMNode*. Using that node, I traverse the entire tree looking for
#text nodes. Whenever I see a #text node, I getNodeValue() and do a
XMLString::transcode() on it to get the char*.
This works fine until I run into a document that has & in its
content. For example,
=========================
...
<TEXT>
Maryland Federal Bancorp Inc., a Hyattsville-based thrift, announced
yesterday
that it will be acquired by BB&T Corp. of Winston-Salem, N.C., for $
265.3
million in stock.
...
=========================
For some reason, the char* I get back from XMLString::transcode()
only gives me the text up to "BB" (in "BB&T"). If I manually delete
the & from the file, it'll parse just fine. So basically, the "&" is
ending the text prematurely.
I'm a total XML noob so I have no clue what to do here. I'm probably
just missing something very basic. Any guidance would be greatly
appreciated.
Thank you.
-Xiaolei