does it work for this kind of urls? http://patft.uspto.gov/netacgi/nph-Parser? thanks! T.
On 1/22/07, Igor Sutton <[EMAIL PROTECTED]> wrote:
Hi Tatiana, 2007/1/22, Tatiana Lloret Iglesias <[EMAIL PROTECTED]>: > i've realized that for each link, i spend most of the time in the following > perl script > > foreach my $url (@lines){ -- I READ MY 1-ROW URL FILE > $contador=2; > $test=0; > while(!$test){ > > $browser2->get($url); > $content = $browser2->content(); > > --IN THESE 2 STEPS I SPEND 6 SECONDS for a 86 kb html, Is it ok? Can i > perform these 2 steps faster? > Are you using the domain name or the ip address in link (e.g. http://www.google.com/ or http://1.2.3.4)? If you are using the first, perl will first contact your DNS server or cache, and then connect and retrieve the contents you want. If you are not using a DNS cache, you can build it using Net::DNS and Memoize for caching. Check the example: <code> #!env perl use strict; use warnings; use Benchmark::Timer; use Carp; use Memoize; use Net::DNS; # used by get_ip_from_hostname my $resolver = Net::DNS::Resolver->new; sub get_ip_from_hostname { my ($hostname) = @_; my $query = $resolver->search($hostname); if ($query) { foreach my $rr ( $query->answer ) { next unless $rr->type eq 'A'; return $rr->address; } } else { croak "Query failed: ", $resolver->errorstring; } } my $t = Benchmark::Timer->new(); for ( 1 .. 1000 ) { $t->start('get_ip_from_hostname without memoize'); my $ip = get_ip_from_hostname("www.google.com"); $t->stop('get_ip_from_hostname without memoize'); } print $t->report(); $t->reset(); memoize('get_ip_from_hostname'); for ( 1 .. 1000 ) { $t->start('get_ip_from_hostname memoize'); my $ip = get_ip_from_hostname("www.google.com"); $t->stop('get_ip_from_hostname memoize'); } print $t->report(); </code> HTH! -- Igor Sutton Lopes <[EMAIL PROTECTED]>