i have a problem while trying to build a spider using perl threads. Consider the program below which is just an example to get going. i wish to hit a certain site's frontpage for any number of times (for example 300) i imagine that since theres a lot of content on the page each request will take some time to process, and therefore i imagine it would be nice to delegate the tasks using threads. my problem is in this very naive example that the unthreaded version is much faster.
two questions: 1) is there something wrong with the threaded code ? 2) does anyone have a working example of a spider using threads ?
thanks ./allan
#############################################################
use strict; use LWP; use threads; use threads::shared; use LWP::RobotUA; use URI;
my $MAX = 300; my %store : shared; my $robot; my $count; my $thr;
my $start = time();
my $url ="http://somewhere.com"; my $THREADS = 0;
init_robot();
# if we have an argument use the unthreaded version if ($ARGV[0]) { main_loop2(); } else { $THREADS = 1; main_loop(); } print_hash();
my $end = time(); my $elapsed = $end - $start; print "This took $elapsed seconds\n";
sub init_robot { $robot = LWP::RobotUA->new("myname", '[EMAIL PROTECTED]' ); my $delay = 1/6000; $robot->delay($delay); }
sub main_loop { while($count < $MAX) { $count++; $thr = threads->new(\&lwp); $thr->join; } }
sub main_loop2 { while($count < $MAX) { $count++; lwp(); } }
sub lwp { my $response = $robot->get( $url ); my $content = $response->content; lock(%store) if $THREADS; if ($content =~ m,<title>([^<>]+)</title>,i) { $store{$count} = $1; } return $count; }
sub print_hash { foreach my $key (keys %store) { print "$key --> $store{$key}\n"; } }