On Tue, May 31, 2011 at 05:05, Chris Nehren <c.nehren/beginn...@shadowcat.co.uk> wrote: > On Tue, May 24, 2011 at 06:25:53 -0700 , Ambuli wrote: >> I am try to crawl a webpage that one is redirected to another. >> I am using Scrappy module for crawling process. >> I am using version 0.94111370 (Updated version). >> Any one suggest me to handle the Redirect. > > What do you mean by "handle the Redirect"? I'm afraid your question > isn't clear. >
I'm assuming that the OP wants to know whether the web request was redirected via a 301 or a 302... It looks like Scrappy handles such redirects transparently, but provides the 'request_denied' method as a flag that can be checked. Here's some sample code that uses a page on one of my domains that gives a 301: --cut-- #! /opt/perl/bin/perl use strict; use warnings; use 5.010; use Scrappy; my $s = Scrappy->new; $s->get( 'http://genehack.org/about' ); say "Status: ",$s->page_status; say "Denied: ",$s->request_denied; my @redirects = $s->response->redirects; say "Original URL: ", $redirects[0]->request->url; say "Fetched URL: ",$s->response->request->url; --cut-- Running this produces: $ ./try.pl Status: 200 Denied: 1 Original URL: http://genehack.org/about Fetched URL: http://genehack.net/about/ As you can see, the status code is reported as a 200, even though there was a redirect done. The 'request' method on the Scrappy object returns an HTTP::Response object. You should read the documentation for that module to understand what the last several lines in my script are doing. You'll need to understand that in order to be able to reliably detect redirects yourself. chrs, john. -- To unsubscribe, e-mail: beginners-unsubscr...@perl.org For additional commands, e-mail: beginners-h...@perl.org http://learn.perl.org/