Hi,

Quite simply, that is not valid xml.

The ampersand is a 'special' character and must be referred to via it's entity-reference ( &anp; ) for the character itself.

You should find a lot of stuff on this via various search engines or basic xml tutorials. You can get the full XML specification from www.w3.org, but the following two articles should suffice and provide further pointers for related reading :

http://www.xml.com/pub/a/2003/02/26/qa.html
http://www.xml.com/pub/a/2001/01/31/qanda.html


Regards

Dara


Xiaolei Li wrote:

Hi,

I'm trying to read in all the #text nodes in a set of XML documents, but I'm running into problems when the document content includes ampersands (&) in the text.

So given a document path, I use XercesDOMParser to get the root DOMNode*. Using that node, I traverse the entire tree looking for #text nodes. Whenever I see a #text node, I getNodeValue() and do a XMLString::transcode() on it to get the char*.

This works fine until I run into a document that has & in its content. For example,
=========================
...
<TEXT>


Maryland Federal Bancorp Inc., a Hyattsville-based thrift, announced yesterday that it will be acquired by BB&T Corp. of Winston-Salem, N.C., for $ 265.3
million in stock.
...
=========================

For some reason, the char* I get back from XMLString::transcode() only gives me the text up to "BB" (in "BB&T"). If I manually delete the & from the file, it'll parse just fine. So basically, the "&" is ending the text prematurely.

I'm a total XML noob so I have no clue what to do here. I'm probably just missing something very basic. Any guidance would be greatly appreciated.

Thank you.

-Xiaolei




Reply via email to