I have now looked through the new code and must simply say that I am
impressed.  It looks like it will actually work now  :-)

Some comments:

* We should really have a common module (perhaps HTML::DTDdata)
  that just contain the information that can be extracted about
  HTML elements/attributes from HTML DTDs. For instance
  %linkElements should not have to be maintained both in
  HTML::LinkExtor and HTML::Element.  The 'eg/hrefsub' script
  from HTML-Parser also need this info.  The 'dtd2pm' script
  was an attempt to do something like this once.  I think it
  is a good idea to have this module generated automagically
  from the most current W3C DTD.

* Can't the $verbose_for_text argument to HTML::Element->traverse
  just be eliminated and assumed to be TRUE always.  Adding arguments
  should not (normally) break anything.

* If find_by_tag_name/find_by_attribute was defined to return the
  first element found in scalar context, then they could be
  modified to stop searching as soon as an element is found.
  Currently they will return the number of elements found
  in scalar context I think.

* 'attr_get_i' is a strange name I think.  What does the "_i" mean?

* I think we still have a memory leak.  I need to investigate.
  I have attached my test program (the print_mem function probably
  only work under linux).

Regards,
Gisle



#!/usr/bin/perl -w

use HTML::TreeBuilder;

for (1..20000) {
    $p = HTML::TreeBuilder->new();
    $p->parse("foo<a=foo><img src='foo'></foo><bar>");
    $p->eof;
    #$p->dump;
    $p->delete;
    undef($p);

    print_mem() unless $_ % 1000;
}

sub print_mem
{
    open(STAT, "/proc/self/status") || die;
    while (<STAT>) {
        print if /^VmSize/;
    }
}
__END__

Reply via email to