[EMAIL PROTECTED] (Sean M. Burke) writes:

> OK, I just uploaded HTML-Tree 0.61 to CPAN.  Look for it on a mirror
> near you soon.

HTML::Parser v3 provide the $is_data flag to the 'text' callback.  If
it is TRUE, then it is actually wrong to decode entities in the text.
This patch to HTML::TreeBuilder fixes that (and should be harmless for
those running under older HTML::Parser versions).

One trouble with this is that HTML::Element->as_html should also known
about this and avoid entity encoding for the content of elements like
<script>, <style>, <xmp>, <plaintext>.  After my patch the
t/parsefile.t example will output <xmp>use &amp;lt;</xmp> which is
kind of wrong.

Regards,
Gisle



Index: lib/HTML/TreeBuilder.pm
===================================================================
RCS file: /home/cvs/aas/perl/mods/html-tree-s/lib/HTML/TreeBuilder.pm,v
retrieving revision 1.1.1.1
diff -u -p -u -r1.1.1.1 TreeBuilder.pm
--- lib/HTML/TreeBuilder.pm     1999/12/16 13:17:53     1.1.1.1
+++ lib/HTML/TreeBuilder.pm     1999/12/16 13:26:37
@@ -904,8 +904,9 @@ sub text {
 
     my $text = shift;
     return unless length $text;
+    my $is_cdata = shift;
 
-    HTML::Entities::decode($text) unless $ignore_text;
+    HTML::Entities::decode($text) unless $ignore_text || $is_cdata;
     
     my($indent, $nugget);
     if($Debug) {
Index: t/parsefile.t
===================================================================
RCS file: /home/cvs/aas/perl/mods/html-tree-s/t/parsefile.t,v
retrieving revision 1.1.1.1
diff -u -p -u -r1.1.1.1 parsefile.t
--- t/parsefile.t       1999/12/16 13:17:53     1.1.1.1
+++ t/parsefile.t       1999/12/16 13:29:47
@@ -17,6 +17,8 @@ This is some text and this is a simple <
 href="http://www.sn.no/libwww-perl/">link</a> back to the
 <b>libwww-perl</b> site.
 
+<xmp>use &lt;</xmp>
+
 <foo a=b
 
 EOT
@@ -25,6 +27,8 @@ close(F);
 $h = HTML::TreeBuilder->new;
 $h->parse_file($file);
 unlink($file);
+
+print $h->dump;
 
 $_ = $h->as_HTML;
 print $_;

Reply via email to