On Thu, Aug 18, 2011 at 4:42 PM, Paul Marquess
<paul.marqu...@btinternet.com> wrote:
>> From: Peng Yu [mailto:pengyu...@gmail.com]
>> Sent: 18 August 2011 15:19
>> To: Paul Marquess
>> Cc: libwww@perl.org; Andy Lester
>> Subject: Re: Subclass of both WWW::Mechanize::GZip and
> WWW::Mechanize::Sleepy
>>
>> On Thu, Aug 18, 2011 at 7:43 AM, Paul Marquess
> <paul.marqu...@btinternet.com> wrote:
>> >>
>> >>
>> >> -----Original Message-----
>> >> From: Peng Yu [mailto:pengyu...@gmail.com]
>> >> Sent: 17 August 2011 23:32
>> >> To: Andy Lester
>> >> Cc: libwww@perl.org
>> >> Subject: Re: Subclass of both WWW::Mechanize::GZip and
>> > WWW::Mechanize::Sleepy
>> >>
>> >> On Wed, Aug 17, 2011 at 5:19 PM, Andy Lester <a...@petdance.com> wrote:
>> >> >
>> >> > On Aug 17, 2011, at 5:05 PM, Peng Yu wrote:
>> >> >
>> >> > Hi,
>> >> >
>> >> > I'd like to use the features of both WWW::Mechanize::GZip and
>> >> > WWW::Mechanize::Sleepy. Does anybody let me know what I should use?
>> >> > Is there a subclass of both?
>> >> >
>> >> > If ::GZip just ungzips content as it comes in, that's now a
>> >> > built-in feature of Mech.  And I know that the fucntionality of
>> >> > ::Sleepy is pretty trivial, and could be incorporated into your own
>> >> > subclass of Mech with minimal effort.
>> >> > xoa
>> >>
>> >> According to man WWW::Mechanize, there is an internal method.
>> >>
>> >>    $mech->_modify_request( $req )
>> >>        Modifies a HTTP::Request before the request is sent out, for
>> >> both
>> > GET and POST requests.
>> >>
>> >>        We add a "Referer" header, as well as header to note that we
>> >> can
>> > accept gzip encoded content, if Compress::Zlib is installed.
>> >>
>> >>
>> >> I have Compress::Zlib installed. But it seems that I have to explicit
>> > called it to modify the request. But I don't want to do so as it is an
>> > internal method.
>> >>
>> >> Currently, I do the following. But this wastes some bandwidth. Would
>> >> you
>> > please let me know how to use WWW::Mechanize with gzip?
>> >>
>> >> $browser->add_header('Accept-Encoding' => 'identity'); my $response =
>> > $browser->get($uri);
>> >
>> > Looking at the WWW::Mechanize code, it looks like it already handles
> gzip.
>> >
>> > If you want an origin server to return gzipped content you can't get
>> > around having to add this header to the HTTP request
>> >
>> >        Accept-Encoding: gzip
>> >
>> > This is what WWW::Mechanize will do if it detects Compress::Zlib is
>> > available.
>> >
>> > Note - by using the "identity" content-encoding you are requesting
>> > that the origin server does not return gzipped content. I assume that
>> > isn't what you want to happen.
>> >
>> > If WWW::Mechanize handles gzip content for you, does that mean that
>> > WWW::Mechanize::Sleepy meets your requirements
>>
>> I don't necessarily need the server to return gzipped content. But if the
> server does (and the server that I try returns zipped content by default,
> unless I tell it not to do so), I need to some how get the gunzipped
> content. As I far as I can see, WWW::Mechanize::Sleepy doesn't
> automoatically gunzip the content, because when I print the
> $response->content, I get some unreadable characters, which is apparently
> gzipped.
>>
>> Besides the way that I use (i.e., telling the server not to send me
> gzipped content), would you please let me know what the code that you that
> you can automatically get the content gunzipped?
>
> Are you sure you have Compress::Zlib installed? - what you want to do should
> just work
>
> This script
>
>    use WWW::Mechanize::Sleepy;
>
>    my $mech = WWW::Mechanize::Sleepy->new( sleep => 1 );
>
>    $mech->get( "http://bbc.co.uk"; );
>    print $mech->dump_headers;
>    print "\n";
>    print $mech->content;

I see where my confusion is. I should access the 'content' of $mech.
But I accessed the 'content' of the response returned by 'get', which
is not gunzipped. Problem solved. Thanks!

> gave me this output (I've truncated it, but the key point is that the
> content is uncompressed). Note presence of Content-Encodong header
>
> Cache-Control: max-age=60, private
> Connection: close
> Date: Thu, 18 Aug 2011 21:36:29 GMT
> Age: 8
> ETag: "1313703381"
> Server: Apache
> Vary: X-Ip-is-advertise-combined
> Content-Encoding: gzip
> Content-Length: 25553
> Content-Type: text/html
> Client-Date: Thu, 18 Aug 2011 21:36:44 GMT
> Client-Peer: 212.58.246.95:80
> Client-Response-Num: 1
> Keep-Alive: timeout=5, max=95
> X-Lb-Nocache: true
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
> "http://www.w3.org/TR/x
> html1/DTD/xhtml1-strict.dtd">
> <html xmlns="http://www.w3.org/1999/xhtml"; xml:lang="en-GB" lang="en-GB">
> <head profile="http://dublincore.org/documents/dcq-html/";>
> <meta name="dcterms.created" content="2011-07-14T14:56:13Z" />
> <meta http-equiv="content-type" content="text/html;charset=UTF-8" />
> <meta name="description" content="Breaking news, sport, TV, radio and a
> whole lo
> t more. The BBC informs, educates and entertains - wherever you are,
> whatever yo
>
>
>
>
>
>



-- 
Regards,
Peng

Reply via email to