On Thu, 04 Aug 2016 17:08:20 +0200 Tim Ruehsen <[email protected]> wrote:
> On Thursday, August 4, 2016 12:40:49 PM CEST Matthew White wrote: > > On Thu, 04 Aug 2016 12:08:50 +0200 > > > > Tim Ruehsen <[email protected]> wrote: > > > On Wednesday, August 3, 2016 2:40:14 PM CEST Matthew White wrote: > > > > On Wed, 03 Aug 2016 14:05:06 +0200 > > > > > > > > Tim Ruehsen <[email protected]> wrote: > > > > > On Tuesday, August 2, 2016 11:27:28 AM CEST Matthew White wrote: > > > > > > On Mon, 01 Aug 2016 16:32:30 +0200 > > > > > > > > > > > > Tim Ruehsen <[email protected]> wrote: > > > > > > > On Saturday, July 30, 2016 9:41:48 PM CEST Matthew White wrote: > > > > > > > > Hello! > > > > > > > > > > > > > > > > I think that sometimes it could help to keep downloaded > > > > > > > > Metalink's > > > > > > > > files > > > > > > > > which have a bad hash. > > > > > > > > > > > > > > > > The default wget behaviour is to delete such files. > > > > > > > > > > > > > > > > This patch provides a way to keep files which have a bad hash > > > > > > > > through > > > > > > > > the > > > > > > > > option --keep-badhash. It appends the suffix .badhash to the > > > > > > > > file > > > > > > > > name, > > > > > > > > except without overwriting existing files. In the latter case, > > > > > > > > an > > > > > > > > unique > > > > > > > > suffix is appended after .badhash. > > > > > > > > > > > > > > > > I made this patch working on the following branch: > > > > > > > > master (latest 20cac2c5ab3d63aacfba35fb10878a2d490e2377) > > > > > > > > git://git.savannah.gnu.org/wget.git > > > > > > > > > > > > > > > > What do you think? > > > > > > > > > > > > > > Hi Matthew, > > > > > > > > > > > > > > good work ! > > > > > > > > > > > > > > While your FSF assignment is underway (PM), we can continue > > > > > > > polishing. > > > > > > > > > > > > > > Didn't test your code yet, but from what I see, there are just > > > > > > > these > > > > > > > peanuts: > > > > > > > > > > > > > > 1. > > > > > > > + bhash = malloc (strlen (name) + strlen (".badhash") + 1); > > > > > > > + strcat (strcpy (bhash, name), ".badhash"); > > > > > > > > > > > > > > Please consider using concat_strings() from util.h. > > > > > > > And please leave an empty line between var declaration and code - > > > > > > > just > > > > > > > for > > > > > > > readability. > > > > > > > > > > > > > > 2. > > > > > > > Please add 'link' and 'unlink' to bootstrap.conf. Gnulib will > > > > > > > emulate > > > > > > > these on platforms where they are not available (or have a > > > > > > > slightly > > > > > > > different behavior). I guess we simply forgot 'unlink' when we > > > > > > > switched > > > > > > > to gnulib. > > > > > > > > > > > > > > 3. > > > > > > > The (GNU) commit messages should ideally look like: > > > > > > > > > > > > > > one line of short description > > > > > > > <empty line> > > > > > > > file changes > > > > > > > <empty line> > > > > > > > long description > > > > > > > > > > > > > > For example: > > > > > > > Add new option --keep-badhash > > > > > > > > > > > > > > * src/init.c: Add keepbadhash > > > > > > > * src/main.c: Add keep-badhash > > > > > > > * src/options.h: Add keep_badhash > > > > > > > * doc/wget.texi: Add docs for --keep-badhash > > > > > > > * src/metalink.h: Add prototypes badhash_suffix(), > > > > > > > badhash_or_remove() > > > > > > > * src/metalink.c: New functions badhash_suffix(), > > > > > > > badhash_or_remove(). > > > > > > > > > > > > > > (retrieve_from_metalink): Call append .badhash() > > > > > > > > > > > > > > With --keep-badhash, append .badhash to Metalink's files with > > > > > > > checksum > > > > > > > mismatch, except without overwriting existing files. > > > > > > > > > > > > > > Without --keep-badhash, remove downloaded files with checksum > > > > > > > mismatch > > > > > > > (this conforms to the old behaviour). > > > > > > > > > > > > > > [This also applies to to your other patches] > > > > > > > > > > > > > > 4. > > > > > > > Please not only write 'keepbadhash', but also what you did (e.g. > > > > > > > remove, > > > > > > > rename, add, ...), see my example above. > > > > > > > > > > > > > > Those ChangeLog entries should allow finding changes / faults / > > > > > > > regressions > > > > > > > etc. even when a versioning system is not at hand, e.g. when > > > > > > > unpacking > > > > > > > the > > > > > > > sources from a tarball. (Not debatable that consulting commit > > > > > > > messages > > > > > > > directly is much more powerful.) > > > > > > > > > > > > > > 5. > > > > > > > You write "except without overwriting existing files", maybe you > > > > > > > should > > > > > > > mention appending a counter instead of overwriting existent files > > > > > > > !? > > > > > > > > > > > > > > > > > > > > > Regards, Tim > > > > > > > > > > > > Thanks Tim! > > > > > > > > > > > > I really needed your guidance. I sent the modified patches to > > > > > > [email protected]. > > > > > > > > > > > > I believe there are more things to fix. > > > > > > > > > > > > Consider the following after applying the attached patch: > > > > > > * src/metalink.c (retrieve_from_metalink): line 124: If the download > > > > > > is > > > > > > interrupted (output_stream isn't NULL), and there are more urls for > > > > > > the > > > > > > same file (we are still in the download loop), switching to the next > > > > > > url > > > > > > should resume the download (instead than start it over). > > > > > > > > > > > > In this patch I added a fix to rename/remove the fully downloaded > > > > > > file > > > > > > (output_stream is NULL), but there was an error (probably checksum) > > > > > > and > > > > > > we > > > > > > are still in the download loop (more urls for the same file). But as > > > > > > said > > > > > > before, switching to the next url should continue the download: * > > > > > > src/metalink.c (retrieve_from_metalink): line 131: Rename/remove > > > > > > fully > > > > > > downloaded file on error > > > > > > > > > > > > I still have to investigate the problem... > > > > > > > > > > So, I wait with 0001-... !? > > > > > > > > > > 0002-... has been pushed (extended with 'symlink'). > > > > > > > > > > Thanks for your contribution. > > > > > > > > > > Tim > > > > > > > > Ok for 0002. Thanks. > > > > > > > > There's no problem applying the 0001. > > > > > > Applied and pushed ! > > > > > > > The "if the stream got interrupted, then restart the download with the > > > > next > > > > url" (output_stream isn't NULL) was already there before the patch 0001. > > > > > > > > [debug is required to know when output_stream isn't NULL] > > > > > > > > commit 7fad76db4cdb7a0fe7e5aa0dd88f5faaf8f4cdc8 > > > > * src/metalink.c (retrieve_from_metalink): line 124: 'if > > > > (output_stream)' > > > > remove file and restart download with next url > > > > > > > > With the 0001, if --keep-badhash is used the file is renamed instead > > > > than > > > > removed, even when "output_stream is NULL". > > > > > > > > I have to look through the "stream interrupted" situation. > > > > > > > > I'm guessing that the download is not resumed. > > > > > > > > What do you suggest? > > > > > > We need a document where we define wanted behavior, create tests and > > > amend the code. > > > > > > Regards, Tim > > > > Ok, I just documented a test. See the attached tarball. > > > > Also there is a patch attached (there's a copy in the tarball too), > > providing the suggested bugfix. > > I made that test without your patch => crash due to badhash_or_remove(NULL). > With your patch, valgrind says: > > ==15068== Syscall param open(filename) points to unaddressable byte(s) > ==15068== at 0x6848F40: __open_nocancel (syscall-template.S:84) > ==15068== by 0x67DFF7D: _IO_file_open (fileops.c:221) > ==15068== by 0x67E009F: _IO_file_fopen@@GLIBC_2.2.5 (fileops.c:328) > ==15068== by 0x67D4F03: __fopen_internal (iofopen.c:86) > ==15068== by 0x434FAE: retrieve_from_metalink (metalink.c:184) > ==15068== by 0x406C63: main (main.c:2025) > ==15068== Address 0x0 is not stack'd, malloc'd or (recently) free'd > ==15068== > > Can you reproduce this ? > > Tim Thanks Tim for spotting that out. I did some tests. Please, don't push the patches, let me understand the bugs first. Vanilla commit e3fb4c3859d2705fb2de80801b0bba2b64bea1a1: * Doesn't understand metalink:file "path/file" name format * Downloads "path/file" as "file" in the working directory * Fails the "path/file" checksum (cannot find "path/file") Vanilla commit, test.meta4: name="bash-4.3-patches/bash43-001" $ wget --input-metalink=test.meta4 $ wget -c --input-metalink=test.meta4 Both the aboves: File OK, checksum fails (cannot find "path/file"). Vanilla commit, test2.meta4: name="bash43-001" $ wget --input-metalink=test2.meta4 File OK, checksum OK. $ wget -c --input-metalink=test2.meta4 Segmentation fault (I'm working on it). Patch 0001 http://lists.gnu.org/archive/html/bug-wget/2016-08/msg00021.html $ wget --input-metalink=test.meta4 $ wget -c --input-metalink=test.meta4 $ wget --input-metalink=test2.meta4 $ wget -c --input-metalink=test2.meta4 All the aboves: File OK, checksum OK (understands "path/file"). Patch 0002 http://lists.gnu.org/archive/html/bug-wget/2016-08/msg00033.html $ wget --input-metalink=test.meta4 $ wget -c --input-metalink=test.meta4 Both the aboves: File OK, checksum fails (cannot find "path/file"). $ wget --input-metalink=test2.meta4 File OK, checksum OK. $ wget -c --input-metalink=test2.meta4 File OK, checksum fails (what?! I'm working on it). 0001 + 0002: All OK. I posted 0001 and 0002 before (see links above), I attach them here too. PS: sorry for the boring message. Later, Matthew -- Matthew White <[email protected]>
>From a7978778db4501b872b0946346c2dedd8e12769d Mon Sep 17 00:00:00 2001 From: Matthew White <[email protected]> Date: Thu, 28 Jul 2016 17:10:46 +0200 Subject: [PATCH 1/2] Bugfix: create/download/verify Metalink's files with a "path/file" format * src/metalink.c (retrieve_from_metalink): Set filename to mfile->name to create/download/verify files with a "path/file" format Bug: * src/utils.c (unique_create, fopen_excl): cannot create "path/file" * src/metalink.c (retrieve_from_metalink): when filename is NULL, "path/file" is downloaded as "file", and it cannot be verified The directory information contained in a metalink:file element shall be used in the same way the option --directory-prefix will. <file name="dirA/dirB/file.gz"> --directory-prefix="dirA/dirB/file.gz" This patch conforms to the RFC5854 specification: The Metalink Download Description Format 4.1.2.1. The "name" Attribute https://tools.ietf.org/html/rfc5854#section-4.1.2.1 --- src/metalink.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/src/metalink.c b/src/metalink.c index 9f34b32..dd16db2 100644 --- a/src/metalink.c +++ b/src/metalink.c @@ -156,6 +156,13 @@ retrieve_from_metalink (const metalink_t* metalink) output_stream = unique_create (mfile->name, true, &filename); output_stream_regular = true; + /* Bug: utils.h (unique_create): cannot create a + "path/file" tree. Bugfix: If filename is set to + "path/file", a tree is created in the same way as + --directory-prefix does (see RFC5854). */ + if (filename == NULL && mfile->name) + filename = xstrdup (mfile->name); + /* Store the real file name for displaying in messages. */ opt.output_document = filename; -- 2.7.3
>From 80c1b02a08c20d946f0dfd8848c27250edfde34a Mon Sep 17 00:00:00 2001 From: Matthew White <[email protected]> Date: Thu, 4 Aug 2016 11:35:42 +0200 Subject: [PATCH 2/2] Bugfix: Continue download when retrying with the next metalink:url * src/metalink.c (retrieve_from_metalink): If output_stream isn't NULL, continue download with the next mres->url Bug: * src/metalink.c (retrieve_from_metalink): If output_stream isn't NULL, restart download with the next mres->url Keep the download progress while iterating metalink:url. In such scenario, closing output_stream means to lose the download progress. --- src/metalink.c | 50 ++++++++++++++++++++++---------------------------- 1 file changed, 22 insertions(+), 28 deletions(-) diff --git a/src/metalink.c b/src/metalink.c index 1799e3a..2e296f9 100644 --- a/src/metalink.c +++ b/src/metalink.c @@ -119,24 +119,6 @@ retrieve_from_metalink (const metalink_t* metalink) retr_err = METALINK_RETR_ERROR; - /* If output_stream is not NULL, then we have failed on - previous resource and are retrying. Thus, rename/remove - the file. */ - if (output_stream) - { - fclose (output_stream); - output_stream = NULL; - badhash_or_remove (filename); - xfree (filename); - } - else if (filename) - { - /* Rename/remove the file downloaded previously before - downloading it again. */ - badhash_or_remove (filename); - xfree (filename); - } - /* Parse our resource URL. */ iri = iri_new (); set_uri_encoding (iri, opt.locale, true); @@ -156,17 +138,29 @@ retrieve_from_metalink (const metalink_t* metalink) /* Avoid recursive Metalink from HTTP headers. */ bool _metalink_http = opt.metalink_over_http; - /* Assure proper local file name regardless of the URL - of particular Metalink resource. - To do that we create the local file here and put - it as output_stream. We restore the original configuration - after we are finished with the file. */ - if (opt.always_rest) - /* continue previous download */ - output_stream = fopen (mfile->name, "ab"); + /* If output_stream is not NULL, then we have failed on + previous resource and are retrying. Thus, continue + with the next resource. Do not close output_stream + while iterating over the resources, or the download + progress will be lost. */ + if (output_stream) + { + DEBUGP (("Previous resource failed, continue with next resource.\n")); + } else - /* create a file with an unique name */ - output_stream = unique_create (mfile->name, true, &filename); + { + /* Assure proper local file name regardless of the URL + of particular Metalink resource. + To do that we create the local file here and put + it as output_stream. We restore the original configuration + after we are finished with the file. */ + if (opt.always_rest) + /* continue previous download */ + output_stream = fopen (mfile->name, "ab"); + else + /* create a file with an unique name */ + output_stream = unique_create (mfile->name, true, &filename); + } output_stream_regular = true; -- 2.7.3
test.meta4
Description: application/metalink4
test2.meta4
Description: application/metalink4
pgpFkKnAdWtuo.pgp
Description: PGP signature
