URL:
<http://savannah.gnu.org/bugs/?33833>
Summary: -c scans binary files as if they were html, after
receiving 416 response
Project: GNU Wget
Submitted by: nok
Submitted on: Sa 23 Jul 2011 12:16:28 CEST
Category: Program Logic
Severity: 3 - Normal
Priority: 5 - Normal
Status: None
Privacy: Public
Assigned to: None
Originator Name:
Originator Email:
Open/Closed: Open
Discussion Lock: Any
Release: 1.12
Operating System: GNU/Linux
Reproducibility: None
Fixed Release: None
Planned Release: None
Regression: None
Work Required: None
Patch Included: None
_______________________________________________________
Details:
While using "wget -c -r" on a directory of large binary files, I
noticed long delays after the "The file is already fully retrieved;
nothing to do." message.
It turns out this is because the server returned a 416 response with
Content-Type: text/html, and so Wget decides to scan the file for
links, as if it were HTML. But the file is not HTML -- just the 416
response body was.
Example:
$ cd /tmp
$ wget -c -d -r http://www.gnu.org/graphics/t-desktop-4-small.jpg
(The file is downloaded as expected, and not scanned for URLs)
$ wget -c -d -r http://www.gnu.org/graphics/t-desktop-4-small.jpg
(This time, notice in the debug output how the file was "Loaded" and
scanned for "no-follow" links. This is the source of the delay on
large binary files).
---response begin---
HTTP/1.1 416 Requested Range Not Satisfiable
Date: Mon, 16 May 2011 21:34:24 GMT
Server: Apache
Vary: Accept-Encoding
Connection: close
Content-Type: text/html; charset=iso-8859-1
---response end---
416 Requested Range Not Satisfiable
The file is already fully retrieved; nothing to do.
Closed fd 3
Loaded www.gnu.org/graphics/t-desktop-4-small.jpg (size 30195).
no-follow in www.gnu.org/graphics/t-desktop-4-small.jpg: 0
-jim
http://bugs.debian.org/626992
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?33833>
_______________________________________________
Nachricht geschickt von/durch Savannah
http://savannah.gnu.org/
--
To UNSUBSCRIBE, email to [email protected]
with a subject of "unsubscribe". Trouble? Contact [email protected]