(Please cc me/reply all)

No matter what I try (including specifically limiting domains with -H and -D) wget crawls sites that are not specified on the command line.

For example. A simple:
% wget -r -l 1 http://www.nytimes.com
[stop after 2 mins, and then]
% ls -1
homedelivery.nytimes.com/
jobmarket.nytimes.com/
listings.nytimes.com/
personal.fidelity.com/
schools.nyc.gov/
select.nytimes.com/
video.on.nytimes.com/
www.brownharrisstevens.com/
www.continental.com/
www.nytimes.com/

Shouldn't it be getting things *just* in nytimes.com? Also, it does appear to be crawling those sites, not just single-links from a site, which appears to go against the -l switch.

Thanks,
- Jesse

Reply via email to