I'm running web pages through HTML::TreeBuilder and certain chracters, namely < , >, ', and & are being encoded. For example, I ran the following text through TreeBuilder:

     The dog's collar 4 > 2 and 2 < 4 & amper

And the output is:

     The dog&#39;s collar 4 &gt; and 2 &lt; 4 &amp; amper

My question then, is this the expected and acceptable behavior for TreeBuilder? According to W3.org, http://www.w3.org/TR/html4/charset.html, a user agent (browser) will translate character sets, but TreeBuilder isn't purporting to be a user agent as LWP would be or is it????

I'm confused and need clarificaiton.

I am using TreeBuilder 3.13 and I have HTML::Tree 3.20 installed on Perl 5.8.7.


--
[EMAIL PROTECTED]
Position Research, Inc.
Search engine results by research
tel: (760) 480-8291 fax: (760) 480-8271
www.PositionResearch.com




Reply via email to