Francois wrote:
I tried to get data from a site which use cookies and redirect the
user, I spend a lot of time with the same result: connection timed out
until I realised that all was fine if I did'nt send the header...
Thanks for any explanations !!!
Francois
here is my code:
use strict;
use warnings;
use LWP;
use HTML::Parser;
use HTML::FormatText;
use HTML::Tree;
# use DateTime::Duration;
use HTTP::Headers;
use HTTP::Cookies;
use HTTP::Cookies::Netscape;
use CGI qw(header -no_debug);
my $h = HTTP::Headers->new(
Accept => "text/xml,application/xml,application/xhtml+xml,text/
html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5",
Host => "www.unifr.ch",
);
$h->server("Apache/2.0.46 (Red Hat)");
$h->user_agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
my $reflink = "http://linkinghub.elsevier.com/retrieve/pii/
S0020138307000095";
my $c = HTTP::Cookies::Netscape->new(file=>'cookies.txt',
autosave=>"1");
my $ua_short = LWP::UserAgent->new(cookie_jar => $c, timeout=>
20);
$ua_short->agent("Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:
1.8.1.9) Gecko/20071025 Firefox/2.0.0.9");
# with this line the header is send with my request and it does
not work
# my $req = HTTP::Request->new(GET=>$reflink, $h);
#with this line it's ok ....
my $req = HTTP::Request->new(GET=>$reflink);
my $response =$ua_short->request($req);
print header;
print $response->status_line,"\n";
my $formatter = HTML::FormatText->new();
if ($response->is_success) {
my $tree =
HTML::TreeBuilder->new->parse($response->content);
my $ascii = $formatter->format($tree);
$tree->delete();
print $ascii;
}
Hi Francois.
As a general rule it's polite to reduce code as much as possible before
posting it here to ask for help: there's a lot of junk in here that
isn't relevant to the problem and just needs to be waded through before
we can give you an answer.
What's going wrong is that you have a Host header value of www.unifr.ch
but you are sending the request to linkinghub.elsevier.com, which
doesn't have a host of that name and so doesn't reply.
But that's a huge amount of code just to fetch a web page! You may need
some of that stuff but I can't see how you would want all of it. How
about just
my $ua = LWP::UserAgent->new;
my $resp =
$ua->get('http://linkinghub.elsevier.com/retrieve/pii/S0020138307000095');
which seems to me to do the same thing.
HTH,
Rob
--
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
http://learn.perl.org/