Jack Goldstein <[EMAIL PROTECTED]> writes: > I've installed HTML::Parser on an AIX 5.1 system running perl 5.8.6 along > with HTML::Tagset and all tests passed except for one relating to POD that > was skipped. However, one of our developers found that it didn't properly > parse titles. Here's a sample program that demonstrates the problem. When > run with the perl 5.8.6 that I installed, the output is > > Help Title is > (blank line) > > but when when run with a copy of perl5.8.0 that someone else installed, we > get: > > Help Title is Installation Help > > which I assume is correct.
Thanks for your bug report. This is indeed a bug. Its cause is that some events would trigger under certain circumstances even after a handler has told the parser to stop with $p->eof. I've now fixed this issue and uploaded HTML-Parser-3.49 to CPAN. My guess would be that your perl5.8.0 installation has a version of HTML-Parser that is older than version 3.40, where we made <title> tags also parse in literal mode. This could explain why this issue didn't occur with that perl installation. > use HTML::Parser; > > my $title=''; > > my $p = HTML::Parser->new(api_version => 3,); > $p->handler(start=> \&title_handler, 'tagname, self'); > $p->parse_file("db2wi.htm"); > print "\nHelp Title is $title\n"; > exit 0; > > ######################################## > # Subroutines > ######################################## > sub title_handler { > return if shift ne 'title'; > my $self = shift; > $self->handler(text => sub { $title= shift}, 'dtext'); BTW, HTML-Parser does not guarantee that all text between the <title>...</title> tags are reported in a single callback, which means this code should append to $title instead of just assigning to it. That would make it: $self->handler(text => sub { $title .= shift}, 'dtext'); Alternatively, set the 'unbroken_text' attribute to a TRUE value. > $self->handler(end => sub { shift->eof if shift eq 'title' }, 'tagname, > self'); > } Regards, Gisle