"Richard S. Crawford" wrote:
>
> I have a directory containing over 250 HTML files. What is the best way to
> extract the title (between the <TITLE> and </TITLE> tags) of each file
> without having to open the file and read in the contents, which seems like
> it would be very slow?
The only way you can read the contents of the files is to actually open
them and read the contents. :-)
#!/usr/bin/perl -w
use strict;
chdir '/dir/that/stores/html' or die "Cannot chdir to
'/dir/that/stores/html' $!";
my @titles;
$/ = 'TITLE>'; # This won't work if your HTML tags are in lower case!
while ( <*.html> ) {
if ( open HTML, $_ ) {
<HTML>; # discard everything before 'TITLE>'
( my $title = <HTML> ) =~ s|</$||;
push @titles, $title;
close HTML;
}
else {
warn "Cannot open $_: $!";
}
}
print join( "\n", @titles ), "\n";
__END__
John
--
use Perl;
program
fulfillment
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]