On 07/29/2015 05:35 AM, Ander Juaristi wrote: > Thus, if the content hasn't been changed, the server just acts as if > no Range header was sent.
> To me, the only sensible solution seems to be not to send > If-Modified-Since when resuming downloads. Because if you send a > conditional GET and the condition is met, the server will go no > further. So I think the issue is not just with continuation, although I had that flag on as it is used in the script where the issue was noted. In general the size of the file-on-disk is not checked with the if-modified-since header === $ git describe v1.16.3-90-g4e56a91 $ ./wget --debug --timestamping http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz ... $ truncate -s 1M ./cirros-0.3.4-x86_64-uec.tar.gz $ ./wget --debug --timestamping http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz Setting --timestamping (timestamping) to 1 DEBUG output created by Wget 1.16.3.90-4e56a on linux-gnu. URI encoding = ‘UTF-8’ --2015-07-29 09:34:25-- http://download.cirros-cloud.net/0.3.4/cirros-0.3.4-x86_64-uec.tar.gz Resolving download.cirros-cloud.net (download.cirros-cloud.net)... 69.163.241.114 Caching download.cirros-cloud.net => 69.163.241.114 Connecting to download.cirros-cloud.net (download.cirros-cloud.net)|69.163.241.114|:80... connected. Created socket 4. Releasing 0x00000000007db700 (new refcount 1). ---request begin--- GET /0.3.4/cirros-0.3.4-x86_64-uec.tar.gz HTTP/1.1 If-Modified-Since: Tue, 28 Jul 2015 23:34:12 GMT User-Agent: Wget/1.16.3.90-4e56a (linux-gnu) Accept: */* Accept-Encoding: identity Host: download.cirros-cloud.net Connection: Keep-Alive ---request end--- HTTP request sent, awaiting response... ---response begin--- HTTP/1.1 304 Not Modified Date: Tue, 28 Jul 2015 23:34:25 GMT Server: Apache Connection: Keep-Alive Keep-Alive: timeout=2, max=100 ETag: "848176-51580ae5ed140" ---response end--- 304 Not Modified Registered socket 4 for persistent reuse. File ‘cirros-0.3.4-x86_64-uec.tar.gz’ not modified on server. Omitting download. === There's probably a strong argument that HTTP isn't the right way to be checking the consistency of a local file to a remote one. Even with the old behaviour, just checking the content-length doesn't catch any internal scrambling. But it's good enough to catch interrupted downloads, etc. >> 2) Without if-modified-since, if the remote end returns a 416 we don't >> re-download if the file-on-disk is larger than the remote end. >> > Just thinking loudly... Maybe If-Range would be a solution here? I don't think so, because from my reading this pairs with the Range header, which will still be set to an invalid range due to [1] where /* If `-c' is in use and the file has been fully downloaded (or the remote file has shrunk), Wget effectively requests bytes after the end of file and the server response with 416 (or 200 with a <= Content-Length. */ i.e. if the file on-disk & at the server is 150 bytes, then "-c" will request from 150 onwards -- the server returns 416 and we assume the file is downloaded. However, if the local file is 200 bytes, we follow the same path but the assumption is now really invalid. Admittedly this case of the local file being *larger* than the remote-file is probably pretty obscure. Thanks, -i [1] http://git.savannah.gnu.org/cgit/wget.git/tree/src/http.c#n3610
