URL:
  <http://savannah.gnu.org/bugs/?33833>

                 Summary: -c scans binary files as if they were html, after
receiving 416 response
                 Project: GNU Wget
            Submitted by: nok
            Submitted on: Sa 23 Jul 2011 12:16:28 CEST
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: 
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 1.12
        Operating System: GNU/Linux
         Reproducibility: None
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None

    _______________________________________________________

Details:

While using "wget -c -r" on a directory of large binary files, I
noticed long delays after the "The file is already fully retrieved;
nothing to do." message.

It turns out this is because the server returned a 416 response with
Content-Type: text/html, and so Wget decides to scan the file for
links, as if it were HTML.  But the file is not HTML -- just the 416
response body was.

Example:

$ cd /tmp
$ wget -c -d -r http://www.gnu.org/graphics/t-desktop-4-small.jpg

(The file is downloaded as expected, and not scanned for URLs)

$ wget -c -d -r http://www.gnu.org/graphics/t-desktop-4-small.jpg

(This time, notice in the debug output how the file was "Loaded" and
scanned for "no-follow" links.  This is the source of the delay on
large binary files).

   ---response begin---
   HTTP/1.1 416 Requested Range Not Satisfiable
   Date: Mon, 16 May 2011 21:34:24 GMT
   Server: Apache
   Vary: Accept-Encoding
   Connection: close
   Content-Type: text/html; charset=iso-8859-1
   
   ---response end---
   416 Requested Range Not Satisfiable
   
       The file is already fully retrieved; nothing to do.
   
   Closed fd 3
   Loaded www.gnu.org/graphics/t-desktop-4-small.jpg (size 30195).
   no-follow in www.gnu.org/graphics/t-desktop-4-small.jpg: 0

-jim

http://bugs.debian.org/626992




    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?33833>

_______________________________________________
  Nachricht geschickt von/durch Savannah
  http://savannah.gnu.org/




-- 
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]

Reply via email to