(Please cc me/reply all)
No matter what I try (including specifically limiting domains with -H
and -D) wget crawls sites that are not specified on the command line.
For example. A simple:
% wget -r -l 1 http://www.nytimes.com
[stop after 2 mins, and then]
% ls -1
homedelivery.nytimes.com/
jobmarket.nytimes.com/
listings.nytimes.com/
personal.fidelity.com/
schools.nyc.gov/
select.nytimes.com/
video.on.nytimes.com/
www.brownharrisstevens.com/
www.continental.com/
www.nytimes.com/
Shouldn't it be getting things *just* in nytimes.com? Also, it does
appear to be crawling those sites, not just single-links from a site,
which appears to go against the -l switch.
Thanks,
- Jesse