On Thu, Aug 18, 2011 at 4:42 PM, Paul Marquess <paul.marqu...@btinternet.com> wrote: >> From: Peng Yu [mailto:pengyu...@gmail.com] >> Sent: 18 August 2011 15:19 >> To: Paul Marquess >> Cc: libwww@perl.org; Andy Lester >> Subject: Re: Subclass of both WWW::Mechanize::GZip and > WWW::Mechanize::Sleepy >> >> On Thu, Aug 18, 2011 at 7:43 AM, Paul Marquess > <paul.marqu...@btinternet.com> wrote: >> >> >> >> >> >> -----Original Message----- >> >> From: Peng Yu [mailto:pengyu...@gmail.com] >> >> Sent: 17 August 2011 23:32 >> >> To: Andy Lester >> >> Cc: libwww@perl.org >> >> Subject: Re: Subclass of both WWW::Mechanize::GZip and >> > WWW::Mechanize::Sleepy >> >> >> >> On Wed, Aug 17, 2011 at 5:19 PM, Andy Lester <a...@petdance.com> wrote: >> >> > >> >> > On Aug 17, 2011, at 5:05 PM, Peng Yu wrote: >> >> > >> >> > Hi, >> >> > >> >> > I'd like to use the features of both WWW::Mechanize::GZip and >> >> > WWW::Mechanize::Sleepy. Does anybody let me know what I should use? >> >> > Is there a subclass of both? >> >> > >> >> > If ::GZip just ungzips content as it comes in, that's now a >> >> > built-in feature of Mech. And I know that the fucntionality of >> >> > ::Sleepy is pretty trivial, and could be incorporated into your own >> >> > subclass of Mech with minimal effort. >> >> > xoa >> >> >> >> According to man WWW::Mechanize, there is an internal method. >> >> >> >> $mech->_modify_request( $req ) >> >> Modifies a HTTP::Request before the request is sent out, for >> >> both >> > GET and POST requests. >> >> >> >> We add a "Referer" header, as well as header to note that we >> >> can >> > accept gzip encoded content, if Compress::Zlib is installed. >> >> >> >> >> >> I have Compress::Zlib installed. But it seems that I have to explicit >> > called it to modify the request. But I don't want to do so as it is an >> > internal method. >> >> >> >> Currently, I do the following. But this wastes some bandwidth. Would >> >> you >> > please let me know how to use WWW::Mechanize with gzip? >> >> >> >> $browser->add_header('Accept-Encoding' => 'identity'); my $response = >> > $browser->get($uri); >> > >> > Looking at the WWW::Mechanize code, it looks like it already handles > gzip. >> > >> > If you want an origin server to return gzipped content you can't get >> > around having to add this header to the HTTP request >> > >> > Accept-Encoding: gzip >> > >> > This is what WWW::Mechanize will do if it detects Compress::Zlib is >> > available. >> > >> > Note - by using the "identity" content-encoding you are requesting >> > that the origin server does not return gzipped content. I assume that >> > isn't what you want to happen. >> > >> > If WWW::Mechanize handles gzip content for you, does that mean that >> > WWW::Mechanize::Sleepy meets your requirements >> >> I don't necessarily need the server to return gzipped content. But if the > server does (and the server that I try returns zipped content by default, > unless I tell it not to do so), I need to some how get the gunzipped > content. As I far as I can see, WWW::Mechanize::Sleepy doesn't > automoatically gunzip the content, because when I print the > $response->content, I get some unreadable characters, which is apparently > gzipped. >> >> Besides the way that I use (i.e., telling the server not to send me > gzipped content), would you please let me know what the code that you that > you can automatically get the content gunzipped? > > Are you sure you have Compress::Zlib installed? - what you want to do should > just work > > This script > > use WWW::Mechanize::Sleepy; > > my $mech = WWW::Mechanize::Sleepy->new( sleep => 1 ); > > $mech->get( "http://bbc.co.uk" ); > print $mech->dump_headers; > print "\n"; > print $mech->content;
I see where my confusion is. I should access the 'content' of $mech. But I accessed the 'content' of the response returned by 'get', which is not gunzipped. Problem solved. Thanks! > gave me this output (I've truncated it, but the key point is that the > content is uncompressed). Note presence of Content-Encodong header > > Cache-Control: max-age=60, private > Connection: close > Date: Thu, 18 Aug 2011 21:36:29 GMT > Age: 8 > ETag: "1313703381" > Server: Apache > Vary: X-Ip-is-advertise-combined > Content-Encoding: gzip > Content-Length: 25553 > Content-Type: text/html > Client-Date: Thu, 18 Aug 2011 21:36:44 GMT > Client-Peer: 212.58.246.95:80 > Client-Response-Num: 1 > Keep-Alive: timeout=5, max=95 > X-Lb-Nocache: true > > <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" > "http://www.w3.org/TR/x > html1/DTD/xhtml1-strict.dtd"> > <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-GB" lang="en-GB"> > <head profile="http://dublincore.org/documents/dcq-html/"> > <meta name="dcterms.created" content="2011-07-14T14:56:13Z" /> > <meta http-equiv="content-type" content="text/html;charset=UTF-8" /> > <meta name="description" content="Breaking news, sport, TV, radio and a > whole lo > t more. The BBC informs, educates and entertains - wherever you are, > whatever yo > > > > > > -- Regards, Peng