Summary: wget -r or mirror with robots-off should still
download robots.txt file
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Tue 23 Jul 2019 03:45:32 PM UTC
                Category: None
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 1.20
        Operating System: None
         Reproducibility: None
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None



GNU Wget 1.20.3 built on darwin18.6.0.

with robots=off, wget does not download the robots.txt file 

wget -r -e robots=off https://www.robotstxt.org/
robots.txt is not downloaded even though it is present 

downloading the root of a site with recursion or --mirror should still save
the robots.txt file, even if it is being ignored. 

The robots.txt file still contains useful information for site mirroring and
archival purposes, even if it isn't being respected .


Reply to this item at:


  Message sent via Savannah

Reply via email to