Re: [Bug-wget] robots.txt seemingly ignored

Darshit Shah Tue, 15 May 2018 02:35:14 -0700

Hi,

You are using a very old version of Wget.  v1.12 was released in 2009 if I
remember correctly.


The current version of Wget doesn't seem to have any issues with the parsing of
that robots.txt. I just tried it locally and it downloads no files at all.

Please update your version of Wget.

* Daniel Feenberg <[email protected]> [180514 16:51]:
>
> I have the following wget command line:
> 
>    wget -r  http://wwwdev.nber.org/
> 
> http://wwwdev.nber.org/robots.txt  is:
> 
>   User-agent: *
>   Disallow: /
> 
>   User-Agent: W3C-checklink
>   Disallow:
> 
> 
> However wget fetches thousands of pages from wwwdev.nber.org. I would have
> thought nothing would be found. (This is a demonstration, obviously in real
> life I'd have a more detailed robots.txt to control the process).
> 
> Obviously too, I don't understand something about wget or robots.txt. Can
> anyone help me out?
> 
> This is GNU Wget 1.12 built on linux-gnu.
> 
> Thank you
> Daniel Feenberg
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

signature.asc
Description: PGP signature

Re: [Bug-wget] robots.txt seemingly ignored

Reply via email to