Hi. I'm trying to take a list of urls:
www.foo.com
www.bar.edu
www.baz.gov
and get the titles. Problem is, some of the urls don't exist. Setting
the timeout low in User Agent from what I understand doesn't really apply
until after a connection is made and data is being processed, so I can
obviously wait quite a while to begin timing out for a non-existent url!
Here's the code I was using (copied from LWP examples):
<code>
#!/usr/bin/perl
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
use URI::Heuristic;
$ifile = "$ARGV[0]\n";
open (I, "< $ifile") || print "can't open $ifile - $!\n";
while (<I>) {
chop;
my $raw_url = $_;
my $url = URI::Heuristic::uf_urlstr($raw_url);
$| = 1;
print $url."\t";
my $ua = LWP::UserAgent->new();
$ua->agent("LoverlyBrower");
$ua->timeout(10);
my $req = HTTP::Request->new(GET => $url);
$req->referer("http://www.toto.oz");
my $response = $ua->request($req);
if ($response->is_error()) {
print $response->status_line."\n";
} else {
my $title = $response->title();
print $title."\n";
}
}
</code>
Should I embed this code into something that checks to see if the host can
be found via first before trying to get the title from a web site running
at the host?
--------------------------------------------------------------------------
Michael Bauer http://www.michaelbauer.com [EMAIL PROTECTED]