[Bug-wget] [bug #56660] wget -r or mirror with robots-off should still download robots.txt file

anonymous Tue, 23 Jul 2019 08:46:00 -0700

URL:
  <https://savannah.gnu.org/bugs/?56660>


                 Summary: wget -r or mirror with robots-off should still
download robots.txt file
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Tue 23 Jul 2019 03:45:32 PM UTC
                Category: None
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 1.20
        Operating System: None
         Reproducibility: None
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None

    _______________________________________________________

Details:

GNU Wget 1.20.3 built on darwin18.6.0.

with robots=off, wget does not download the robots.txt file 

wget -r -e robots=off https://www.robotstxt.org/
robots.txt is not downloaded even though it is present 

Expected: 
downloading the root of a site with recursion or --mirror should still save
the robots.txt file, even if it is being ignored. 

The robots.txt file still contains useful information for site mirroring and
archival purposes, even if it isn't being respected .





    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?56660>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/

[Bug-wget] [bug #56660] wget -r or mirror with robots-off should still download robots.txt file

Reply via email to