[EMAIL PROTECTED] (Sean M. Burke) writes:
> OK, I just uploaded HTML-Tree 0.61 to CPAN. Look for it on a mirror
> near you soon.
HTML::Parser v3 provide the $is_data flag to the 'text' callback. If
it is TRUE, then it is actually wrong to decode entities in the text.
This patch to HTML::TreeBuilder fixes that (and should be harmless for
those running under older HTML::Parser versions).
One trouble with this is that HTML::Element->as_html should also known
about this and avoid entity encoding for the content of elements like
<script>, <style>, <xmp>, <plaintext>. After my patch the
t/parsefile.t example will output <xmp>use &lt;</xmp> which is
kind of wrong.
Regards,
Gisle
Index: lib/HTML/TreeBuilder.pm
===================================================================
RCS file: /home/cvs/aas/perl/mods/html-tree-s/lib/HTML/TreeBuilder.pm,v
retrieving revision 1.1.1.1
diff -u -p -u -r1.1.1.1 TreeBuilder.pm
--- lib/HTML/TreeBuilder.pm 1999/12/16 13:17:53 1.1.1.1
+++ lib/HTML/TreeBuilder.pm 1999/12/16 13:26:37
@@ -904,8 +904,9 @@ sub text {
my $text = shift;
return unless length $text;
+ my $is_cdata = shift;
- HTML::Entities::decode($text) unless $ignore_text;
+ HTML::Entities::decode($text) unless $ignore_text || $is_cdata;
my($indent, $nugget);
if($Debug) {
Index: t/parsefile.t
===================================================================
RCS file: /home/cvs/aas/perl/mods/html-tree-s/t/parsefile.t,v
retrieving revision 1.1.1.1
diff -u -p -u -r1.1.1.1 parsefile.t
--- t/parsefile.t 1999/12/16 13:17:53 1.1.1.1
+++ t/parsefile.t 1999/12/16 13:29:47
@@ -17,6 +17,8 @@ This is some text and this is a simple <
href="http://www.sn.no/libwww-perl/">link</a> back to the
<b>libwww-perl</b> site.
+<xmp>use <</xmp>
+
<foo a=b
EOT
@@ -25,6 +27,8 @@ close(F);
$h = HTML::TreeBuilder->new;
$h->parse_file($file);
unlink($file);
+
+print $h->dump;
$_ = $h->as_HTML;
print $_;