This is a patch to HTML::HeadParser to let it cope better with some of the badly written web pages out there. Although a web page should have only one <title> element inside its <head>, some web pages manage to have more than one. An example is <http://asu.info.apple.com/swupdates.nsf/artnum/n10465>, which begins like this: <HTML> <!-- Lotus-Domino (Release 4.6.2 (Intl) - 23 July 1998 on AIX) --> <HEAD> <TITLE>Apple - Software Updates for LaserWriter Software 8.5.1</TITLE> [lots of <META> stuff snipped] <TITLE></TITLE> </HEAD> In a web browser, the first title is displayed. But with HTML::HeadParser, only the last title counts - so you get back an empty string. Now of course you could argue that this is reasonable. If people write invalid HTML, that's their problem. But the de facto standard is what the browser displays, and both Netscape 4.72 and MSIE 5.00 display the first title. A general solution would be to aggregate all the <title> elements together into a single title. Here is a patch: *** HeadParser.pm.orig Thu Dec 9 19:07:33 1999 --- HeadParser.pm Tue Jul 18 12:45:30 2000 *************** *** 135,141 **** $text =~ s/\s+/ /g; print "FLUSH $tag => '$text'\n" if $DEBUG; if ($tag eq 'title') { ! $self->{'header'}->header(Title => $text); } $self->{'tag'} = $self->{'text'} = ''; } --- 135,156 ---- $text =~ s/\s+/ /g; print "FLUSH $tag => '$text'\n" if $DEBUG; if ($tag eq 'title') { ! my $old_title = $self->{'header'}->header('Title'); ! my $new_title; ! if (defined $old_title and $old_title !~ /^\s*$/) { ! # Some badly written pages have more than one title, but ! # some titles may be empty. Attempt to sort things out. ! if ($text !~ /^\s*$/) { ! $new_title = "$old_title // $text"; ! } ! else { ! $new_title = $old_title; ! } ! } ! else { ! $new_title = $text; ! } ! $self->{'header'}->header(Title => $new_title); } $self->{'tag'} = $self->{'text'} = ''; } This handles well written web pages with only one title, badly written ones like the Apple/Domino one above, and (I hope) even worse ones. I don't subscribe to this list (I just stumbled across the problem by accident), so please cc: replies to me. -- Ed Avis [EMAIL PROTECTED]