Summary: wget -r or mirror with robots-off should still
download robots.txt file
Project: GNU Wget
Submitted by: None
Submitted on: Tue 23 Jul 2019 03:45:32 PM UTC
Severity: 3 - Normal
Priority: 5 - Normal
Assigned to: None
Discussion Lock: Any
Operating System: None
Fixed Release: None
Planned Release: None
Work Required: None
Patch Included: None
GNU Wget 1.20.3 built on darwin18.6.0.
with robots=off, wget does not download the robots.txt file
wget -r -e robots=off https://www.robotstxt.org/
robots.txt is not downloaded even though it is present
downloading the root of a site with recursion or --mirror should still save
the robots.txt file, even if it is being ignored.
The robots.txt file still contains useful information for site mirroring and
archival purposes, even if it isn't being respected .
Reply to this item at:
Message sent via Savannah