[dom4j-user] problem with numeric character reference

Kevin Varley Tue, 10 Aug 2004 23:02:12 -0700

Hello all,

I'm having some trouble with a numeric chracter
reference.  I have some well-formed UTF-8 encoded html
that i am pulling from a database table and would like
to parse and manipulate with dom4j. Some of the html
contains numeric character references like &amp;#8221;
to represent a right close quotation mark.  After
creating a Document object with SAXReader, the
references are converted to a single character.  For
example, &amp;#8221; is converted to, when viewed in a
hex editor, 1C.


So I guess I'd like to know whether there is a means
of disabling the processing of numeric character
references? I realize this may be a parser issue but
was curious if anyone had run into a similar problem.

Thanks in advance for any help.

Kevin



        
                
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 100MB free storage!
http://promotions.yahoo.com/new_mail 


-------------------------------------------------------
SF.Net email is sponsored by Shop4tech.com-Lowest price on Blank Media
100pk Sonic DVD-R 4x for only $29 -100pk Sonic DVD+R for only $33
Save 50% off Retail on Ink & Toner - Free Shipping and Free Gift.
http://www.shop4tech.com/z/Inkjet_Cartridges/9_108_r285
_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

[dom4j-user] problem with numeric character reference

Reply via email to