> wget --user-agent "Mozilla/5.0 (Windows NT x.y; WOW64; rv:10.0) > Gecko/20100101 Firefox/10.0" -e robots=off --header="accept-encoding: gzip > " -p -H "www.google.com" > > Still only gives me 52 kb! and one file: index.html > > So, accept encoding seems to work, but only for the main file?
As Ángel said, the main file is gzipped but wget can't parse it. That's why you just get one file (index.html). (This file could be named index.html.gz to reflect the content.) You could manually gzip -d it and feed the resulting HTML file to wget manually, like wget -r --force-html --input-file index.html --base www.google.com There have been patches to support gzip encoding, but either they were half- baken or the authors did not sign the FSF copyright assignment. *Note* [Meanwhile, we are working on wget2. Content encodings like gzip and deflate are already built in here. Also lzma and bzip2 for even better compression (but servers don't support it out-of-the-box yet).] Regards, Tim
