> From: Peng Yu [mailto:pengyu...@gmail.com] > Sent: 18 August 2011 15:19 > To: Paul Marquess > Cc: libwww@perl.org; Andy Lester > Subject: Re: Subclass of both WWW::Mechanize::GZip and WWW::Mechanize::Sleepy > > On Thu, Aug 18, 2011 at 7:43 AM, Paul Marquess <paul.marqu...@btinternet.com> wrote: > >> > >> > >> -----Original Message----- > >> From: Peng Yu [mailto:pengyu...@gmail.com] > >> Sent: 17 August 2011 23:32 > >> To: Andy Lester > >> Cc: libwww@perl.org > >> Subject: Re: Subclass of both WWW::Mechanize::GZip and > > WWW::Mechanize::Sleepy > >> > >> On Wed, Aug 17, 2011 at 5:19 PM, Andy Lester <a...@petdance.com> wrote: > >> > > >> > On Aug 17, 2011, at 5:05 PM, Peng Yu wrote: > >> > > >> > Hi, > >> > > >> > I'd like to use the features of both WWW::Mechanize::GZip and > >> > WWW::Mechanize::Sleepy. Does anybody let me know what I should use? > >> > Is there a subclass of both? > >> > > >> > If ::GZip just ungzips content as it comes in, that's now a > >> > built-in feature of Mech. And I know that the fucntionality of > >> > ::Sleepy is pretty trivial, and could be incorporated into your own > >> > subclass of Mech with minimal effort. > >> > xoa > >> > >> According to man WWW::Mechanize, there is an internal method. > >> > >> $mech->_modify_request( $req ) > >> Modifies a HTTP::Request before the request is sent out, for > >> both > > GET and POST requests. > >> > >> We add a "Referer" header, as well as header to note that we > >> can > > accept gzip encoded content, if Compress::Zlib is installed. > >> > >> > >> I have Compress::Zlib installed. But it seems that I have to explicit > > called it to modify the request. But I don't want to do so as it is an > > internal method. > >> > >> Currently, I do the following. But this wastes some bandwidth. Would > >> you > > please let me know how to use WWW::Mechanize with gzip? > >> > >> $browser->add_header('Accept-Encoding' => 'identity'); my $response = > > $browser->get($uri); > > > > Looking at the WWW::Mechanize code, it looks like it already handles gzip. > > > > If you want an origin server to return gzipped content you can't get > > around having to add this header to the HTTP request > > > > Accept-Encoding: gzip > > > > This is what WWW::Mechanize will do if it detects Compress::Zlib is > > available. > > > > Note - by using the "identity" content-encoding you are requesting > > that the origin server does not return gzipped content. I assume that > > isn't what you want to happen. > > > > If WWW::Mechanize handles gzip content for you, does that mean that > > WWW::Mechanize::Sleepy meets your requirements > > I don't necessarily need the server to return gzipped content. But if the server does (and the server that I try returns zipped content by default, unless I tell it not to do so), I need to some how get the gunzipped content. As I far as I can see, WWW::Mechanize::Sleepy doesn't automoatically gunzip the content, because when I print the $response->content, I get some unreadable characters, which is apparently gzipped. > > Besides the way that I use (i.e., telling the server not to send me gzipped content), would you please let me know what the code that you that you can automatically get the content gunzipped?
Are you sure you have Compress::Zlib installed? - what you want to do should just work This script use WWW::Mechanize::Sleepy; my $mech = WWW::Mechanize::Sleepy->new( sleep => 1 ); $mech->get( "http://bbc.co.uk" ); print $mech->dump_headers; print "\n"; print $mech->content; gave me this output (I've truncated it, but the key point is that the content is uncompressed). Note presence of Content-Encodong header Cache-Control: max-age=60, private Connection: close Date: Thu, 18 Aug 2011 21:36:29 GMT Age: 8 ETag: "1313703381" Server: Apache Vary: X-Ip-is-advertise-combined Content-Encoding: gzip Content-Length: 25553 Content-Type: text/html Client-Date: Thu, 18 Aug 2011 21:36:44 GMT Client-Peer: 212.58.246.95:80 Client-Response-Num: 1 Keep-Alive: timeout=5, max=95 X-Lb-Nocache: true <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/x html1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB" lang="en-GB"> <head profile="http://dublincore.org/documents/dcq-html/"> <meta name="dcterms.created" content="2011-07-14T14:56:13Z" /> <meta http-equiv="content-type" content="text/html;charset=UTF-8" /> <meta name="description" content="Breaking news, sport, TV, radio and a whole lo t more. The BBC informs, educates and entertains - wherever you are, whatever yo