Hi,
I'm using the following code to try to delete <p> items and their
contents from HTML pages. My ultimate goal is to strip certain <p>
while retaining others, however, the following code doesn't seem to
want to strip all the <p> nodes. I'm wondering if my understanding
of HTML::Element is flawed (quite possible), my usage is bad (also
possible) or if this is some kind of bug.
If I call $tree->traverse(\&callbacktest); repeatedly I eventually strip
out all the <p> nodes but this would make my end goal difficult - to
identify and strip certain nodes.
Here is the basic, flawed code:
use strict;
use HTML::TreeBuilder; # requires modules from
HTML::Parse
use HTML::Element; # available from
http://search.cpan.org/search?module=HTML::Parse
my $p_counter = 0; # count the paragraphs
- easiest way to identify the ones we don't want?
foreach my $file_name (@ARGV) {
my $tree = HTML::TreeBuilder->new; # empty tree
$tree->parse_file($file_name);
print "Hey, here's a dump of the parse tree of $file_name before we
play with it:\n";
$tree->dump; # a method we inherit from
HTML::Element
$tree->traverse(\&callbacktest);
print "Hey, here's a dump of the parse tree of $file_name after we play with
it:\n";
$tree->dump;
print "And here it is, bizarrely rerendered as HTML:\n",
$tree->as_HTML, "\n";
$tree = $tree->delete; # Now that we're done with it,
we must destroy it.
}
sub callbacktest {
my ($node, $start, $depth) = @_; # get the values passed to the
callback function
if (ref $node) { # does $ node
reference part of the tree?
my $currenttag = $node->tag; # if it does, get the tag
if ($currenttag eq "p") { # if the tag is <p>,
delete it and everything inside it.
$node->delete;
}
}
return HTML::Element::OK;
}
Here is sample HTML:
<html>
<head>
</head>
<body>
<center>
<p>First</p>
</center>
<center>
<p>Second</p>
</center>
<p>Third</p>
<p>Fourth</p>
<p>Fifth</p>
<p>Sixth</p>
<center>
<p>Seventh</p>
</center>
</body>
</html>
>From running this code on this sample I still have the Fifth and
Seventh <p>s in there.
Any suggestions? Anything enlightening to read? I did look around
some for some examples of using HTML::Element and didn't find
too much, I found Randal S.'s Oct 98 Web Techniques article on
using it very helpful, anything more l could read?
Thanks!
Chris Cothrun
[EMAIL PROTECTED]
Chris