-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Matthew Young wrote: > Hello Micah & friends, > > Iam trying to download content only for the specified domain in -D. However > running a test with: > > wget -r -Dwww.cnn.com http://www.cnn.com > > > I noticed that it also creates directories with other subdomains and even > domains that cnn.com has links: > > money.cnn.com sportsillustrated.cnn.com www.ew.com www.time.com > transcripts.cnn.com www.turnerstoreonline.com
Assuming you don't have other things in your ~/.wgetrc that allow these hosts (note that Wget won't even follow links to other hosts if you specify -D, you have to also specify -H), my guess would be that these were the results of redirects. That is, they correspond to a location on www.cnn.com that redirected to a different host. You can check whether this is the case by examining the log. The --debug flag is particularly helpful for producing useful logs, but in this case it shouldn't be necessary, so long as you have the normal "verbose" output. > What is the way to achieve what I want or would this be a bug? If my hunch is correct, then I'm afraid there's no way to avoid them. Wget does not currently provide any facility for avoiding redirects to other sites. > If its a bug.. is thereanway to tell wget to download everything to > www.cnn.com directory (even if it has to download subdomain stuff) I'd go for "-P www.cnn.com -nH". This means, "Put all downloaded content in www.cnn.com/, and don't generate an extra directory for hostnames. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. Maintainer of GNU Wget and GNU Teseq http://micah.cowan.name/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkq8+VUACgkQ7M8hyUobTrFAggCdE7vpGqQecKaxczfROmhDfdIt xYwAoIrXOWNfjFkSdJrzNH53pmRvwGzc =KrqD -----END PGP SIGNATURE-----
