[Bug-wget] wget returns HTTP 302 found and does not download all content of webpage

Umair Thu, 05 Jan 2012 05:15:45 -0800

Hi,

 I am using wget to download some webpages, which involve a redirect (HTTP
302). But it seems that wget doesn’t support url redirect, or may be i am
missing some option. For example, at my location, www.google.com redirects
to www.google.de


 Now If i use the url http://www.google.com as an argument to wget command,
i get the following output:

********************************************************************************************************************
user:/test> wget -S -t 1 -nd -E --user-agent=Mozilla/4.0
--no-check-certificate -4 -e robots=off -p http://www.google.com

asking libproxy about url 'http://www.google.com/'
libproxy suggest to use 'direct://'
--2012-01-04 16:03:57-- http://www.google.com/
Resolving www.google.com... 173.194.69.103, 173.194.69.104, 173.194.69.105,
...
Connecting to www.google.com|173.194.69.103|:80... connected.
HTTP request sent, awaiting response...
  HTTP/1.0 302 Found
  Location: http://www.google.de/
  Cache-Control: private
  Content-Type: text/html; charset=UTF-8
  Set-Cookie:
PREF=ID=719837320fac89a6:FF=0:TM=1325689437:LM=1325689437:S=Ix3mGRG5_MVHxwCB;
expires=Fri, 03-Jan-2014 15:03:57 GMT; path=/; domain=.google.com
  Date: Wed, 04 Jan 2012 15:03:57 GMT
  Server: gws
  Content-Length: 218
  X-XSS-Protection: 1; mode=block
  X-Frame-Options: SAMEORIGIN
  Connection: Keep-Alive
Location: http://www.google.de/ [following]
asking libproxy about url 'http://www.google.de/'
libproxy suggest to use 'direct://'
--2012-01-04 16:03:57-- http://www.google.de/
Resolving www.google.de... 173.194.69.99, 173.194.69.103, 173.194.69.104,
...
Reusing existing connection to www.google.com:80.
HTTP request sent, awaiting response...
  HTTP/1.0 200 OK
  Date: Wed, 04 Jan 2012 15:03:57 GMT
  Expires: -1
  Cache-Control: private, max-age=0
  Content-Type: text/html; charset=ISO-8859-1
  Set-Cookie:
PREF=ID=c3a1a61eef5a50b6:FF=0:TM=1325689437:LM=1325689437:S=hkRHrWuLAfD0GFMv;
expires=Fri, 03-Jan-2014 15:03:57 GMT; path=/; domain=.google.de
  Set-Cookie:
NID=54=KzclGc-p4pi7KOE1thfOBpoHJzV8MLLuPAxmyTK9GliHKcGnjGqmVExySo-N2aI-vzuN9iSoTN4f9D5TPBbQY2LTihcbh5Hu49nUaKsfJXTNwYdiSKHVTSJoVSoJ9syB;
expires=Thu, 05-Jul-2012 15:03:57 GMT; path=/; domain=.google.de; HttpOnly
  P3P: CP="This is not a P3P policy! See
http://www.google.com/support/accounts/bin/answer.py?hl=en&answer=151657for
more info."
  Server: gws
  X-XSS-Protection: 1; mode=block
  X-Frame-Options: SAMEORIGIN
Length: unspecified [text/html]
Saving to: “index.html.1.5.html”

    [ <=> ] 8,892 --.-K/s in 0.02s

2012-01-04 16:03:57 (440 KB/s) - “index.html.1.5.html” saved [8892]

FINISHED --2012-01-04 16:03:57--
Downloaded: 1 files, 8.7K in 0.02s (440 KB/s)
*******************************************************************************************************************

If i use http://www.google.de as url, then it successfully downloads the
web page with the following results:

Downloaded: 6 files, 55K in 0.06s (849 KB/s)

Please note the difference between downloaded content in case of redirect
and no redirect. Same happens with any other url when it involves a
redirect with HTTP status code 302. i.e. only 1 html file is downloaded in
case of redirect.

Kindly suggest me the possible solution of this error. Is it really an
error or am i missing something?

P.S. I have opensuse version 11.4 and wget version: GNU Wget 1.12

Regards..

[Bug-wget] wget returns HTTP 302 found and does not download all content of webpage

Reply via email to