Re: Extracting Titles from a bunch of HTML files

John W. Krahn Wed, 02 Jan 2002 04:13:54 -0800

"Richard S. Crawford" wrote:
> 
> I have a directory containing over 250 HTML files.  What is the best way to
> extract the title (between the <TITLE> and </TITLE> tags) of each file
> without having to open the file and read in the contents, which seems like
> it would be very slow?


The only way you can read the contents of the files is to actually open
them and read the contents.  :-)

#!/usr/bin/perl -w
use strict;

chdir '/dir/that/stores/html' or die "Cannot chdir to
'/dir/that/stores/html' $!";

my @titles;
$/ = 'TITLE>';   # This won't work if your HTML tags are in lower case!
while ( <*.html> ) {
    if ( open HTML, $_ ) {
        <HTML>;   # discard everything before 'TITLE>'
        ( my $title = <HTML> ) =~ s|</$||;
        push @titles, $title;
        close HTML;
        }
    else {
        warn "Cannot open $_: $!";
        }
    }

print join( "\n", @titles ), "\n";

__END__



John
-- 
use Perl;
program
fulfillment

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Extracting Titles from a bunch of HTML files

Reply via email to