Re: How to handle the Redirected using Scrappy Module

John SJ Anderson Tue, 31 May 2011 18:41:47 -0700

On Tue, May 31, 2011 at 05:05, Chris Nehren
<c.nehren/beginn...@shadowcat.co.uk> wrote:
> On Tue, May 24, 2011 at 06:25:53 -0700 , Ambuli wrote:
>> I am try to crawl a webpage that one is redirected to another.
>> I am using Scrappy module for crawling process.
>> I am using version 0.94111370 (Updated version).
>> Any one suggest me to handle the Redirect.
>
> What do you mean by "handle the Redirect"? I'm afraid your question
> isn't clear.
>


I'm assuming that the OP wants to know whether the web request was
redirected via a 301 or a 302...

It looks like Scrappy handles such redirects transparently, but
provides the 'request_denied' method as a flag that can be checked.
Here's some sample code that uses a page on one of my domains that
gives a 301:


--cut--
#! /opt/perl/bin/perl

use strict;
use warnings;
use 5.010;

use Scrappy;

my $s = Scrappy->new;
$s->get( 'http://genehack.org/about' );

say "Status: ",$s->page_status;
say "Denied: ",$s->request_denied;

my @redirects = $s->response->redirects;
say "Original URL: ", $redirects[0]->request->url;
say "Fetched URL:  ",$s->response->request->url;
--cut--

Running this produces:

$ ./try.pl
Status: 200
Denied: 1
Original URL: http://genehack.org/about
Fetched URL:  http://genehack.net/about/

As you can see, the status code is reported as a 200, even though
there was a redirect done.

The 'request' method on the Scrappy object returns an HTTP::Response
object. You should read the documentation for that module to
understand what the last several lines in my script are doing. You'll
need to understand that in order to be able to reliably detect
redirects yourself.

chrs,
john.

-- 
To unsubscribe, e-mail: beginners-unsubscr...@perl.org
For additional commands, e-mail: beginners-h...@perl.org
http://learn.perl.org/

Re: How to handle the Redirected using Scrappy Module

Reply via email to