On 2/12/19 12:21 AM, Darshit Shah wrote: > * Tim Rühsen <tim.rueh...@gmx.de> [190211 13:45]: >> You are right, --if-modified-since changes -N behavior in case a file is >> incomplete. --if-modified-since can't easily be fixed since the 304 >> response does not include file size information. >> >> As you suggest, we should disable this option by default or at least >> discuss the options we have. > > That's correct. While, the lack of a Content-Length header on a 304 response > causes problems, we can't rely on it to exist even for normal 200 / 206 > response. > > Let me try to aggregate some of the possible options (I'm not saying any of > these are particularly a good idea): > > 1. Write file to a tmpfile and on successful download, move it to the real > location. > This option has multiple problems. Firstly, people don't expect Wget to > write to a tmp file. This can be problematic, especially when people try to > play streaming data without a -O. But for the purposes of dealing with -N > and --if-modified-since, this is the best option. > > 2. Issue a utime() call after every write() in order to set the mtime again to > something older than the one reported by the server. > In this, we would need to issue a utime() after each call to write() in > order > to reset its mtime to an earlier time. After the file is fully downloaded, > set the mtime to the actual one as provided by the server. This introduces > an issue where Wget is issuing too many system calls. And with Wget2, it > might get really bad due to downloading ~30+ files in parallel. I'm also > unsure of how the kernel handles races between write() and utime() calls. > We > don't want to set the mtime of the file and have it overwritten by the > previous write() call. This might be valid option, especially since it is > cross platform. However, the performance impact would need to be evaluated. > > 3. Only enable If-Modified-Since when xattr is available. > The idea here is simple, on systems where xattr is possible, store either > an > old timestamp or a completion flag in the attributes. Use this metadata to > issue a If-Modified-Since header. If xattr is not available or the > attributes are not found, use the HEAD+GET approach. > > > Are there any other options that I've missed?
4. Do not use --if-modified-since by default with -N - let the user control it. We only have an issue if the -N download gets interrupted and should be continued later. This is often not the case - like in my personal interactive '-r -N' scenarios. Of course it's error-prone to non-aware users. But you asked for other options. Didn't I solve that issue for Wget2 already ? From src/wget.c (http_receive_response): if (resp->last_modified) { /* If program was aborted, we store file times one second less than the server time. * So a later download with -N would start over instead of leaving incomplete data. * Or a later download with -c -N would continue with a IF-MODIFIED-SINCE: HTTP header. */ if (config.xattr && !terminate) write_xattr_last_modified(resp->last_modified, context->outfd); set_file_mtime(context->outfd, resp->last_modified - terminate); } Regards, Tim > >> On 2/10/19 2:42 PM, Lawrence Wade wrote: >>> Hi Tim, >>> >>> Okay. Using the OpenSUSE-packaged wget (1.19.5) that comes with Leap 15.0: >>> >>> $ wget -r -N 192.168.2.100:8080 >>> ... >>> Reusing existing connection to 192.168.2.100:8080. >>> HTTP request sent, awaiting response... 304 Not Modified >>> File ‘192.168.2.100:8080/OaP6ysTyz6Y.mp >>> 4’ not modified on server. Omitting download. >>> >>> This file is incomplete in my local copy. >>> >>> Trying again as you suggest, >>> >>> $ wget -r -N --no-if-modified-since 192.168.2.100:8080 >>> ... >>> --2019-02-10 08:35:14-- http://192.168.2.100:8080/OaP6ysTyz6Y.mp4 >>> Reusing existing connection to 192.168.2.100:8080. >>> HTTP request sent, awaiting response... 200 OK >>> Length: 38044195 (36M) [application/octet-stream] >>> The sizes do not match (local 8643456) -- retrieving. >>> --2019-02-10 08:35:14-- http://192.168.2.100:8080/OaP6ysTyz6Y.mp4 >>> Reusing existing connection to 192.168.2.100:8080. >>> HTTP request sent, awaiting response... 200 OK >>> Length: 38044195 (36M) [application/octet-stream] >>> Saving to: ‘192.168.2.100:8080/OaP6ysTy >>> z6Y.mp4 >>> ... >>> >>> And it appears to work as expected. Won't this change to the behaviour >>> of -N option subtly break a lot of scripts which rely on wget? >>> >>> Thanks so much, Tim. I do have an answer and a workaround though my >>> concerns remain. >>> >>> Lawrence Wade >>> Ottawa, Canada >>> >>> On Sun, Feb 10, 2019 at 2:11 AM Lawrence Wade <lawrencepw...@gmail.com> >>> wrote: >>>> >>>> Hi Everyone, >>>> >>>> This might be a corroboration of this >>>> http://lists.gnu.org/archive/html/bug-wget/2018-10/msg00049.html >>>> and this >>>> https://bugs.launchpad.net/ubuntu/+source/wget/+bug/1715481 >>>> >>>> I use wget to backup my cellphone running Palapa Web Server, and it >>>> has worked well for me for years. Since upgrading to OpenSUSE Leap 15, >>>> I have been having corrupted files. >>>> >>>> My method is >>>> $ wget -r -N 192.168.2.100:8080 >>>> and if the connection is interrupted for any reason, the next time I >>>> call wget it would complete any incomplete files. And since Leap 15, I >>>> have been getting gradually corrupted backups. I was tearing my hair >>>> out looking at wgetrc and other things. >>>> >>>> With one long file that I knew was incomplete, I got a Not Modified - >>>> omitting download, even though I knew the file sizes were different >>>> between the server and wget's copy - though the wget man page >>>> explicitly states that if the file sizes do not match, -N will trigger >>>> a download. >>>> >>>> I tried on OpenSUSE 42.3 (wget 1.14) and the incomplete file triggered >>>> a download, even though wgetrc was identical. >>>> >>>> Again, on Leap 15, I compiled 1.20.1 (latest), 1.17.1, and then >>>> finally with 1.16.3 the behaviour went back to what I expected (and I >>>> got my corrupted phone backups fixed). >>>> >>>> Was a bug possibly introduced in 1.17 with the support for >>>> --if-modified-since? >>>> >>>> Version shipping with OpenSUSE Leap 15: >>>> GNU Wget 1.19.5 built on linux-gnu. >>>> +cares +digest +gpgme +https +ipv6 +iri +large-file +metalink +nls >>>> +ntlm +opie +psl +ssl/openssl >>>> >>>> Last version I tried where "wget -r -N" works as expected: >>>> GNU Wget 1.16.3 built on linux-gnu. >>>> +digest +https +ipv6 -iri +large-file +nls +ntlm +opie +psl +ssl/gnutls >>>> >>>> I'm open to the possibility that there may be something else causing >>>> this bug, I have not found many mentions of it, but then again it is >>>> subtle. You get pretty confident when you just let wget do its thing, >>>> so there may be a lot of incomplete files out there... :) >>>> >>>> Thanks so much for your help. I can provide any other info that would >>>> be helpful. >>>> >>>> Lawrence Wade >>>> Ottawa, Canada >>> >> > > >
signature.asc
Description: OpenPGP digital signature