Jack Goldstein <[EMAIL PROTECTED]> writes:

> I've installed HTML::Parser on an AIX 5.1 system running perl 5.8.6 along 
> with HTML::Tagset and all tests passed except for one relating to POD that 
> was skipped.  However, one of our developers found that it didn't properly 
> parse titles.  Here's a sample program that demonstrates the problem. When 
> run with the perl 5.8.6 that I installed, the output is
> 
>         Help Title is 
>                                 (blank line)
> 
> but when when run with a copy of perl5.8.0 that someone else installed, we 
> get:
> 
>         Help Title is Installation Help
> 
> which I assume is correct.

Thanks for your bug report.  This is indeed a bug.  Its cause is that
some events would trigger under certain circumstances even after a
handler has told the parser to stop with $p->eof.  I've now fixed this
issue and uploaded HTML-Parser-3.49 to CPAN.

My guess would be that your perl5.8.0 installation has a version of
HTML-Parser that is older than version 3.40, where we made <title>
tags also parse in literal mode.  This could explain why this issue
didn't occur with that perl installation.

> use HTML::Parser;
> 
> my $title='';
> 
> my $p = HTML::Parser->new(api_version => 3,);
> $p->handler(start=> \&title_handler, 'tagname, self');
> $p->parse_file("db2wi.htm");
> print "\nHelp Title is $title\n";
> exit 0;
> 
> ########################################
> # Subroutines
> ########################################
> sub title_handler {
>  return if shift ne 'title';
>  my $self = shift; 
>  $self->handler(text => sub { $title= shift}, 'dtext');

BTW, HTML-Parser does not guarantee that all text between the
<title>...</title> tags are reported in a single callback, which means
this code should append to $title instead of just assigning to it.
That would make it:

   $self->handler(text => sub { $title .= shift}, 'dtext');

Alternatively, set the 'unbroken_text' attribute to a TRUE value.

>  $self->handler(end => sub { shift->eof if shift eq 'title' }, 'tagname, 
> self');
> }

Regards,
Gisle

Reply via email to