Hi.  I'm trying to take a list of urls:

  www.foo.com
  www.bar.edu
  www.baz.gov

and get the titles.  Problem is, some of the urls don't exist.  Setting
the timeout low in User Agent from what I understand doesn't really apply
until after a connection is made and data is being processed, so I can
obviously wait quite a while to begin timing out for a non-existent url!
Here's the code I was using (copied from LWP examples):

<code>
#!/usr/bin/perl

use LWP::UserAgent;
use HTTP::Request; 
use HTTP::Response;
use URI::Heuristic;

$ifile = "$ARGV[0]\n";

open (I, "< $ifile") || print "can't open $ifile - $!\n";

while (<I>) {
    chop;
    my $raw_url = $_;
    my $url = URI::Heuristic::uf_urlstr($raw_url);
    $| = 1;
    print $url."\t";
    my $ua = LWP::UserAgent->new();
    $ua->agent("LoverlyBrower");
    $ua->timeout(10);
    my $req = HTTP::Request->new(GET => $url);
    $req->referer("http://www.toto.oz";);
    my $response = $ua->request($req);
    if ($response->is_error()) {
        print $response->status_line."\n";
    } else {
        my $title = $response->title();
        print $title."\n";
    }
}
</code>

Should I embed this code into something that checks to see if the host can
be found via first before trying to get the title from a web site running
at the host?

--------------------------------------------------------------------------
Michael Bauer      http://www.michaelbauer.com      [EMAIL PROTECTED]


Reply via email to