Hi, You are using a very old version of Wget. v1.12 was released in 2009 if I remember correctly.
The current version of Wget doesn't seem to have any issues with the parsing of that robots.txt. I just tried it locally and it downloads no files at all. Please update your version of Wget. * Daniel Feenberg <feenb...@nber.org> [180514 16:51]: > > I have the following wget command line: > > wget -r http://wwwdev.nber.org/ > > http://wwwdev.nber.org/robots.txt is: > > User-agent: * > Disallow: / > > User-Agent: W3C-checklink > Disallow: > > > However wget fetches thousands of pages from wwwdev.nber.org. I would have > thought nothing would be found. (This is a demonstration, obviously in real > life I'd have a more detailed robots.txt to control the process). > > Obviously too, I don't understand something about wget or robots.txt. Can > anyone help me out? > > This is GNU Wget 1.12 built on linux-gnu. > > Thank you > Daniel Feenberg > -- Thanking You, Darshit Shah PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
signature.asc
Description: PGP signature