On Thu, 04 Aug 2016 16:47:18 +0200 Tim Ruehsen <[email protected]> wrote:
> On Wednesday, August 3, 2016 1:46:56 PM CEST Matthew White wrote: > > On Tue, 02 Aug 2016 11:27:08 +0200 > > > > Tim Ruehsen <[email protected]> wrote: > > > On Tuesday, August 2, 2016 10:06:42 AM CEST Matthew White wrote: > > > > On Sat, 30 Jul 2016 21:23:56 +0200 > > > > > > > > Matthew White <[email protected]> wrote: > > > > > Hello! > > > > > > > > > > I noticed that wget doesn't use the metalink:file element as the > > > > > RFC5854 > > > > > suggests. > > > > > > > > > > https://tools.ietf.org/html/rfc5854#section-4.1.2.1 > > > > > > > > > > The RFC5854 specifies that the metalink:file could have a "path/file" > > > > > format. In this case wget should create the "path" tree and save the > > > > > file > > > > > as "path/file", but it doesn't. Instead wget saves the file in the > > > > > working directory. > > > > > > > > > > e.g. <file name="dirA/dirB/file.gz"> > > > > > > > > > > With this patch wget conforms to the RFC5854. > > > > > > > > > > I made this patch working on the following branch: > > > > > master (latest 20cac2c5ab3d63aacfba35fb10878a2d490e2377) > > > > > git://git.savannah.gnu.org/wget.git > > > > > > > > > > Let me know if this helps. > > > > > > > > Hi, > > > > > > > > After the suggestions of Tim, I fixed the patch (nice! fix the fix...). > > > > So, > > > > scratch the previous patch and use this one instead. > > > > > > > > The function concat_strings() replaces the combination of malloc(), > > > > strlen(), and strcpy(). (Thanks Tim.) > > > > > > In this special case (just one string to clone), please use xstrdup(). > > > That is much less overhead. > > > > > > > The description now follows the style of the other patches. (Thanks > > > > again > > > > Tim!) > > > > > > > > You may consider this patch a Bugfix: > > > > * src/utils.c (unique_create, fopen_excl): cannot create a directory > > > > tree > > > > like "path/file". > > > > > > > > The problem is that fopen_excl() doesn't create non-existing > > > > directories. > > > > What do you suggest? > > > > > > IMO, saving metalink files should work 'as usual and expected'. That > > > means, all the directory options should apply (see man wget / Directory > > > Options). We also have to deal with 'escaping' file name sequences like > > > ../ etc. > > > Not to forget character set conversions. > > > > > > What about cases where you download https://myserver/file.tgz, and in the > > > metalink file 'name' is set to 'xyz.doc' ? IMO, we should keep file.tgz, > > > except the user stated --trust-server-names. > > > > > > Maybe we should set up a a metalink document where we define how > > > everything > > > should work including corner cases. Next step would be to set up > > > appropriate tests and fix the wget metalink code to survive the tests. > > > > > > > Can we make this patch obsolete? > > > > > > Basically yes. > > > But we see, there is still much to do around metalink ... but most of the > > > needed code is already there. Most work will be the definition and the > > > tests. > > > > > > Regards, Tim > > > > Changed to use xstrdup(), new patch attached. > > I already mentioned, that taking the metalink filename pulls in a security > issue. It might escape the current directory with a relative or absolute path. Actually, wget aborts if something like "../path/file" is found as metalink:file name (see attached document as patch). The same goes for "./path/file" and "/path/file". > > You might just take the file name without path, but only if the the option -- > trust-server-names has been set. > > Tim Hi Tim, I got your PM, did you get mine (I had an email server problem)? I wrote a draft Metalink.README trying to explain the current interaction between "Directory Options" and the option '--include-metalink=file' on the command line. I attach the Metalink.README here as a patch. Can you confirm my findings? (Don't rush) Later, Matthew -- Matthew White <[email protected]>
>From e15c851d6e26a118f58f79a7c69de7c51adeba9d Mon Sep 17 00:00:00 2001 From: Matthew White <[email protected]> Date: Fri, 5 Aug 2016 20:10:19 +0200 Subject: [PATCH] Add README to describe "Directory Option" and Metalink module interactions * Metalink.README Evaluation of "Directory Options" on the command line interacting with the option '--input-metalink=file': $ wget --input-metalink=file <directory options> --- Metalink.README | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 137 insertions(+) create mode 100644 Metalink.README diff --git a/Metalink.README b/Metalink.README new file mode 100644 index 0000000..31734a3 --- /dev/null +++ b/Metalink.README @@ -0,0 +1,137 @@ +GNU Wget Metalink module (--input-metalink) + + Evaluation of "Directory Options" on the command line + + +1. Introduction +*************** + +This document, and the results contained in it, is focused over the +testing of the metalink:file "path/file" name format. + +The "Directory Options" mentioned here are used on the command line in +conjunction with the option '--input-metalink=file': + +$ wget --input-metalink=file <directory options> + +2. Notes +******** + +Tests containing a metalink:file "/path/file", "./path/file", or +"../path/file" name shall be run manually due to security concerns. + +3. Metalink files used as reference +*********************************** + +3.1 Test: metalink:file with "path/file" name format +==================================================== + +cat > test.meta4 << EOF +<?xml version="1.0" encoding="UTF-8"?> +<metalink xmlns="urn:ietf:params:xml:ns:metalink"> + <file name="path/file"> + <size>543</size> + <hash type="sha256">d37d3965f8e1a7b16504b4273b09c392776b7e4dd17e601256c7b2fd9ce5f56e</hash> + <hash type="md5">0f6ff5cdc15603f1b81227b5a296f001</hash> + <url>http://wrongurl.really/gnu/wget/wget-1.18.tar.xz.sig</url> + <url>http://ftpmirror.gnu.org/wget/wget-1.18.tar.xz.sig</url> + <url>http://ftp.gnu.org/gnu/wget/wget-1.18.tar.xz.sig</url> + <url>http://nl.mirror.babylon.network/gnu/wget/wget-1.18.tar.xz.sig</url> + </file> +</metalink> +EOF + +4. `wget --input-metalink=test.meta4` +************************************* + +4.1 Implemented safety features +=============================== + +Do not follow relative or absolute paths: "/path/file", "./path/file", +and "../path/file" as metalink:file name formats are all ignored (wget +refuses to start). The options --trust-server-names changes nothing. + +4.2 Actual behaviour +==================== + +Given a metalink:file "path/file" name, if "path" exists, download +"path/file", then compute its checksum. If "path" doesn't exist, +download the url's file in the working directory; then the checksum +fails: cannot find "path/file". + +4.3 Questionable behaviours +=========================== + +If more metalink:file elements are the same, wget downloads them all. + +4.4 Bugs +======== + +The download is OK even when metalink:file size is wrong. + +5. Directory Options +******************** + +'-nd' +'--no-directories' + + Used alone has no effect (see `wget --input-metalink=test.meta4`). + + Used in conjunction with --recursive, given "path/file", if "path" + exists, download "path/file" and compute its checksum. If "path" + doesnt' exist, download the url's file in the working directory, + then the checksum fails: cannot find "path/file". + +'-x' +'--force-directories' + + Given "path/file", if "path" exists, download "path/file", then + compute its checksum. If "path" doesn't exist, create the url + hierarchy, then the checksum fails: cannot find "path/file". + +'-nH' +'--no-host-directories' + + Given "path/file", if "path" exists, download "path/file", then + compute its checksum. If "path" doesn't exist, download the url's + file in the working directory, then the checksum fails: cannot + find "path/file"; in this context, if --force-directories is + present, create the url hierarchy omitting the host component. + +'--protocol-directories' + + Used alone has no effect (see `wget --input-metalink=test.meta4`). + + In conjunction with --force-directories, use the protocol name as + the first directory component (see --force-directories). + +'--cut-dirs=number' + + Used alone has no effect (see `wget --input-metalink=test.meta4`). + + In conjunction with --force-directories, ignore 'number' directory + components after the domain (see --force-directories). + +'-P prefix' +'--directory-prefix=prefix' + + This is buggy or non-intuitive. + + Given "path/file", and more metalink:url uris for the same file, + if '-P path' is specified, the first url's file is downloaded as + "path/<url_file>", and the second url's file as "path/file". The + first file fails the checksum: cannot find "path/file". The file + "path/file" passes the checksum verification. + + Given "path/file", and more metalink:url uris for the same file, + if '-P newp' is specified, all the urls' files are downloaded as + "newp/<url_file>. A suffix counter is added to the file names to + not overwrite existing files. Then all the checksums fail: cannot + find "path/file". + + Given "path/file", and more metalink:url uris for the same file, + if '-P ../path' is specified, the same things as if '-P ../newp' + or '-P newp' will happen, e.g. "newp/<url_file> and checksums + failures. + + [write here more wrong things happening] -- 2.7.3
pgp0MYNe5yU9h.pgp
Description: PGP signature
