Thanks for the insights. and for working on the next version. andreas On Wed, Sep 23, 2015 at 3:10 AM, Tim Ruehsen <[email protected]> wrote:
> > wget --user-agent "Mozilla/5.0 (Windows NT x.y; WOW64; rv:10.0) > > Gecko/20100101 Firefox/10.0" -e robots=off --header="accept-encoding: > gzip > > " -p -H "www.google.com" > > > > Still only gives me 52 kb! and one file: index.html > > > > So, accept encoding seems to work, but only for the main file? > > As Ángel said, the main file is gzipped but wget can't parse it. > That's why you just get one file (index.html). (This file could be named > index.html.gz to reflect the content.) > You could manually gzip -d it and feed the resulting HTML file to wget > manually, like wget -r --force-html --input-file index.html --base > www.google.com > > There have been patches to support gzip encoding, but either they were > half- > baken or the authors did not sign the FSF copyright assignment. > > *Note* > [Meanwhile, we are working on wget2. Content encodings like gzip and > deflate > are already built in here. Also lzma and bzip2 for even better compression > (but servers don't support it out-of-the-box yet).] > > Regards, Tim > >
