I have now looked through the new code and must simply say that I am
impressed. It looks like it will actually work now :-)
Some comments:
* We should really have a common module (perhaps HTML::DTDdata)
that just contain the information that can be extracted about
HTML elements/attributes from HTML DTDs. For instance
%linkElements should not have to be maintained both in
HTML::LinkExtor and HTML::Element. The 'eg/hrefsub' script
from HTML-Parser also need this info. The 'dtd2pm' script
was an attempt to do something like this once. I think it
is a good idea to have this module generated automagically
from the most current W3C DTD.
* Can't the $verbose_for_text argument to HTML::Element->traverse
just be eliminated and assumed to be TRUE always. Adding arguments
should not (normally) break anything.
* If find_by_tag_name/find_by_attribute was defined to return the
first element found in scalar context, then they could be
modified to stop searching as soon as an element is found.
Currently they will return the number of elements found
in scalar context I think.
* 'attr_get_i' is a strange name I think. What does the "_i" mean?
* I think we still have a memory leak. I need to investigate.
I have attached my test program (the print_mem function probably
only work under linux).
Regards,
Gisle
#!/usr/bin/perl -w
use HTML::TreeBuilder;
for (1..20000) {
$p = HTML::TreeBuilder->new();
$p->parse("foo<a=foo><img src='foo'></foo><bar>");
$p->eof;
#$p->dump;
$p->delete;
undef($p);
print_mem() unless $_ % 1000;
}
sub print_mem
{
open(STAT, "/proc/self/status") || die;
while (<STAT>) {
print if /^VmSize/;
}
}
__END__