URL: <https://savannah.gnu.org/bugs/?56660>
Summary: wget -r or mirror with robots-off should still download robots.txt file Project: GNU Wget Submitted by: None Submitted on: Tue 23 Jul 2019 03:45:32 PM UTC Category: None Severity: 3 - Normal Priority: 5 - Normal Status: None Privacy: Public Assigned to: None Originator Name: Originator Email: Open/Closed: Open Discussion Lock: Any Release: 1.20 Operating System: None Reproducibility: None Fixed Release: None Planned Release: None Regression: None Work Required: None Patch Included: None _______________________________________________________ Details: GNU Wget 1.20.3 built on darwin18.6.0. with robots=off, wget does not download the robots.txt file wget -r -e robots=off https://www.robotstxt.org/ robots.txt is not downloaded even though it is present Expected: downloading the root of a site with recursion or --mirror should still save the robots.txt file, even if it is being ignored. The robots.txt file still contains useful information for site mirroring and archival purposes, even if it isn't being respected . _______________________________________________________ Reply to this item at: <https://savannah.gnu.org/bugs/?56660> _______________________________________________ Message sent via Savannah https://savannah.gnu.org/