On Thu, 04 Aug 2016 16:47:18 +0200
Tim Ruehsen <[email protected]> wrote:

> On Wednesday, August 3, 2016 1:46:56 PM CEST Matthew White wrote:
> > On Tue, 02 Aug 2016 11:27:08 +0200
> > 
> > Tim Ruehsen <[email protected]> wrote:
> > > On Tuesday, August 2, 2016 10:06:42 AM CEST Matthew White wrote:
> > > > On Sat, 30 Jul 2016 21:23:56 +0200
> > > > 
> > > > Matthew White <[email protected]> wrote:
> > > > > Hello!
> > > > > 
> > > > > I noticed that wget doesn't use the metalink:file element as the
> > > > > RFC5854
> > > > > suggests.
> > > > > 
> > > > > https://tools.ietf.org/html/rfc5854#section-4.1.2.1
> > > > > 
> > > > > The RFC5854 specifies that the metalink:file could have a "path/file"
> > > > > format. In this case wget should create the "path" tree and save the
> > > > > file
> > > > > as "path/file", but it doesn't. Instead wget saves the file in the
> > > > > working directory.
> > > > > 
> > > > > e.g. <file name="dirA/dirB/file.gz">
> > > > > 
> > > > > With this patch wget conforms to the RFC5854.
> > > > > 
> > > > > I made this patch working on the following branch:
> > > > > master (latest 20cac2c5ab3d63aacfba35fb10878a2d490e2377)
> > > > > git://git.savannah.gnu.org/wget.git
> > > > > 
> > > > > Let me know if this helps.
> > > > 
> > > > Hi,
> > > > 
> > > > After the suggestions of Tim, I fixed the patch (nice! fix the fix...).
> > > > So,
> > > > scratch the previous patch and use this one instead.
> > > > 
> > > > The function concat_strings() replaces the combination of malloc(),
> > > > strlen(), and strcpy(). (Thanks Tim.)
> > > 
> > > In this special case (just one string to clone), please use xstrdup().
> > > That is much less overhead.
> > > 
> > > > The description now follows the style of the other patches. (Thanks
> > > > again
> > > > Tim!)
> > > > 
> > > > You may consider this patch a Bugfix:
> > > > * src/utils.c (unique_create, fopen_excl): cannot create a directory
> > > > tree
> > > > like "path/file".
> > > > 
> > > > The problem is that fopen_excl() doesn't create non-existing
> > > > directories.
> > > > What do you suggest?
> > > 
> > > IMO, saving metalink files should work 'as usual and expected'. That
> > > means, all the directory options should apply (see man wget / Directory
> > > Options). We also have to deal with 'escaping' file name sequences like
> > > ../ etc.
> > > Not to forget character set conversions.
> > > 
> > > What about cases where you download https://myserver/file.tgz, and in the
> > > metalink file 'name' is set to 'xyz.doc' ? IMO, we should keep file.tgz,
> > > except the user stated --trust-server-names.
> > > 
> > > Maybe we should set up a a metalink document where we define how
> > > everything
> > > should work including corner cases. Next step would be to set up
> > > appropriate tests and fix the wget metalink code to survive the tests.
> > > 
> > > > Can we make this patch obsolete?
> > > 
> > > Basically yes.
> > > But we see, there is still much to do around metalink ... but most of the
> > > needed code is already there. Most work will be the definition and the
> > > tests.
> > > 
> > > Regards, Tim
> > 
> > Changed to use xstrdup(), new patch attached.
> 
> I already mentioned, that taking the metalink filename pulls in a security 
> issue. It might escape the current directory with a relative or absolute path.

Actually, wget aborts if something like "../path/file" is found as 
metalink:file name (see attached document as patch). The same goes for 
"./path/file" and "/path/file".

> 
> You might just take the file name without path, but only if the the option --
> trust-server-names has been set.
> 
> Tim

Hi Tim,

I got your PM, did you get mine (I had an email server problem)?

I wrote a draft Metalink.README trying to explain the current interaction 
between "Directory Options" and the option '--include-metalink=file' on the 
command line.

I attach the Metalink.README here as a patch.

Can you confirm my findings? (Don't rush)

Later,
Matthew

-- 
Matthew White <[email protected]>
>From e15c851d6e26a118f58f79a7c69de7c51adeba9d Mon Sep 17 00:00:00 2001
From: Matthew White <[email protected]>
Date: Fri, 5 Aug 2016 20:10:19 +0200
Subject: [PATCH] Add README to describe "Directory Option" and Metalink module
 interactions

* Metalink.README

Evaluation of "Directory Options" on the command line interacting with
the option '--input-metalink=file':

$ wget --input-metalink=file <directory options>
---
 Metalink.README | 137 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 137 insertions(+)
 create mode 100644 Metalink.README

diff --git a/Metalink.README b/Metalink.README
new file mode 100644
index 0000000..31734a3
--- /dev/null
+++ b/Metalink.README
@@ -0,0 +1,137 @@
+GNU Wget Metalink module (--input-metalink)
+
+  Evaluation of "Directory Options" on the command line
+
+
+1. Introduction
+***************
+
+This document, and the results contained in it, is focused over the
+testing of the metalink:file "path/file" name format.
+
+The "Directory Options" mentioned here are used on the command line in
+conjunction with the option '--input-metalink=file':
+
+$ wget --input-metalink=file <directory options>
+
+2. Notes
+********
+
+Tests containing a metalink:file "/path/file", "./path/file", or
+"../path/file" name shall be run manually due to security concerns.
+
+3. Metalink files used as reference
+***********************************
+
+3.1 Test: metalink:file with "path/file" name format
+====================================================
+
+cat > test.meta4 << EOF
+<?xml version="1.0" encoding="UTF-8"?>
+<metalink xmlns="urn:ietf:params:xml:ns:metalink">
+  <file name="path/file">
+    <size>543</size>
+    <hash type="sha256">d37d3965f8e1a7b16504b4273b09c392776b7e4dd17e601256c7b2fd9ce5f56e</hash>
+    <hash type="md5">0f6ff5cdc15603f1b81227b5a296f001</hash>
+    <url>http://wrongurl.really/gnu/wget/wget-1.18.tar.xz.sig</url>
+    <url>http://ftpmirror.gnu.org/wget/wget-1.18.tar.xz.sig</url>
+    <url>http://ftp.gnu.org/gnu/wget/wget-1.18.tar.xz.sig</url>
+    <url>http://nl.mirror.babylon.network/gnu/wget/wget-1.18.tar.xz.sig</url>
+  </file>
+</metalink>
+EOF
+
+4. `wget --input-metalink=test.meta4`
+*************************************
+
+4.1 Implemented safety features
+===============================
+
+Do not follow relative or absolute paths: "/path/file", "./path/file",
+and "../path/file" as metalink:file name formats are all ignored (wget
+refuses to start). The options --trust-server-names changes nothing.
+
+4.2 Actual behaviour
+====================
+
+Given a metalink:file "path/file" name, if "path" exists, download
+"path/file", then compute its checksum. If "path" doesn't exist,
+download the url's file in the working directory; then the checksum
+fails: cannot find "path/file".
+
+4.3 Questionable behaviours
+===========================
+
+If more metalink:file elements are the same, wget downloads them all.
+
+4.4 Bugs
+========
+
+The download is OK even when metalink:file size is wrong.
+
+5. Directory Options
+********************
+
+'-nd'
+'--no-directories'
+
+    Used alone has no effect (see `wget --input-metalink=test.meta4`).
+
+    Used in conjunction with --recursive, given "path/file", if "path"
+    exists, download "path/file" and compute its checksum.  If "path"
+    doesnt' exist, download the url's file in the working directory,
+    then the checksum fails: cannot find "path/file".
+
+'-x'
+'--force-directories'
+
+    Given "path/file", if "path" exists, download "path/file", then
+    compute its checksum.  If "path" doesn't exist, create the url
+    hierarchy, then the checksum fails: cannot find "path/file".
+
+'-nH'
+'--no-host-directories'
+
+    Given "path/file", if "path" exists, download "path/file", then
+    compute its checksum.  If "path" doesn't exist, download the url's
+    file in the working directory, then the checksum fails: cannot
+    find "path/file"; in this context, if --force-directories is
+    present, create the url hierarchy omitting the host component.
+
+'--protocol-directories'
+
+    Used alone has no effect (see `wget --input-metalink=test.meta4`).
+
+    In conjunction with --force-directories, use the protocol name as
+    the first directory component (see --force-directories).
+
+'--cut-dirs=number'
+
+    Used alone has no effect (see `wget --input-metalink=test.meta4`).
+
+    In conjunction with --force-directories, ignore 'number' directory
+    components after the domain (see --force-directories).
+
+'-P prefix'
+'--directory-prefix=prefix'
+
+    This is buggy or non-intuitive.
+
+    Given "path/file", and more metalink:url uris for the same file,
+    if '-P path' is specified, the first url's file is downloaded as
+    "path/<url_file>", and the second url's file as "path/file". The
+    first file fails the checksum: cannot find "path/file". The file
+    "path/file" passes the checksum verification.
+
+    Given "path/file", and more metalink:url uris for the same file,
+    if '-P newp' is specified, all the urls' files are downloaded as
+    "newp/<url_file>. A suffix counter is added to the file names to
+    not overwrite existing files. Then all the checksums fail: cannot
+    find "path/file".
+
+    Given "path/file", and more metalink:url uris for the same file,
+    if '-P ../path' is specified, the same things as if '-P ../newp'
+    or '-P newp' will happen, e.g. "newp/<url_file> and checksums
+    failures.
+
+    [write here more wrong things happening]
-- 
2.7.3

Attachment: pgp0MYNe5yU9h.pgp
Description: PGP signature

Reply via email to