Hi, The manual says
"If the local file does not exist, or the sizes of the files do not match, Wget will download the remote file no matter what the time-stamps say." In two cases I'm not seeing this: 1) With if-modified-since I don't believe the content-length is checked at all 2) Without if-modified-since, if the remote end returns a 416 we don't re-download if the file-on-disk is larger than the remote end. Here's a quick example where we increase the size of the file $ ./wget http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz $ truncate -s 10M cirros-0.3.4-x86_64-uec.tar.gz # modify the file size So firstly, when using current git, we see the "If-Modified-Since" request sent, but I guess the server does not look at "Range" because it just returns 304, despite us asking for bytes the file doesn't have. wget doesn't notice that the local file is a different size. --- $ ./wget --debug --timestamping -c http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz Setting --timestamping (timestamping) to 1 Setting --continue (continue) to 1 DEBUG output created by Wget 1.16.3.90-4e56a on linux-gnu. URI encoding = ‘UTF-8’ --2015-07-28 13:00:28-- http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz Resolving download.cirros-cloud.net (download.cirros-cloud.net)... 69.163.241.114 Caching download.cirros-cloud.net => 69.163.241.114 Connecting to download.cirros-cloud.net (download.cirros-cloud.net)|69.163.241.114|:80... connected. Created socket 4. Releasing 0x00000000014dc720 (new refcount 1). ---request begin--- GET /0.3.4/cirros-0.3.4-x86_64-uec.tar.gz HTTP/1.1 If-Modified-Since: Tue, 28 Jul 2015 03:00:24 GMT Range: bytes=10485760- User-Agent: Wget/1.16.3.90-4e56a (linux-gnu) Accept: */* Accept-Encoding: identity Host: download.cirros-cloud.net Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 304 Not Modified Date: Tue, 28 Jul 2015 03:00:30 GMT Server: Apache Connection: Keep-Alive Keep-Alive: timeout=2, max=100 ETag: "848176-51580ae5ed140" ---response end--- 304 Not Modified Registered socket 4 for persistent reuse. File ‘cirros-0.3.4-x86_64-uec.tar.gz’ not modified on server. Omitting download. --- Using --no-if-modified-since, we see the server does notice the range and returns a 416 (Range Not Satisfiable). --- $ ./wget --debug --no-if-modified-since --timestamping -c http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz Setting --timestamping (timestamping) to 1 Setting --continue (continue) to 1 DEBUG output created by Wget 1.16.3.90-4e56a on linux-gnu. URI encoding = ‘UTF-8’ --2015-07-28 13:00:41-- http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz Resolving download.cirros-cloud.net (download.cirros-cloud.net)... 69.163.241.114 Caching download.cirros-cloud.net => 69.163.241.114 Connecting to download.cirros-cloud.net (download.cirros-cloud.net)|69.163.241.114|:80... connected. Created socket 4. Releasing 0x0000000000fbc6c0 (new refcount 1). ---request begin--- HEAD /0.3.4/cirros-0.3.4-x86_64-uec.tar.gz HTTP/1.1 Range: bytes=10485760- User-Agent: Wget/1.16.3.90-4e56a (linux-gnu) Accept: */* Accept-Encoding: identity Host: download.cirros-cloud.net Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 416 Requested Range Not Satisfiable Date: Tue, 28 Jul 2015 03:00:41 GMT Server: Apache Vary: Accept-Encoding Keep-Alive: timeout=2, max=100 Connection: Keep-Alive Content-Type: text/html; charset=iso-8859-1 ---response end--- 416 Requested Range Not Satisfiable Registered socket 4 for persistent reuse. URI content encoding = ‘iso-8859-1’ The file is already fully retrieved; nothing to do. --- So this is due to [1] where, as the comment says /* If `-c' is in use and the file has been fully downloaded (or the remote file has shrunk), Wget effectively requests bytes after the end of file and the server response with 416 (or 200 with a <= Content-Length. */ i.e. if the file on-disk & at the server is 150 bytes, then "-c" will request from 150 onwards -- the server returns 416 and we assume the file is downloaded. However, if the local file is 200 bytes, we follow the same path but the assumption is now really invalid. I think the first-case is more important; I think that with If-Modified-Since the size-on-disk is not being accounted for at all. Thanks, -i [1] http://git.savannah.gnu.org/cgit/wget.git/tree/src/http.c#n3610
