On Wed, 10 Aug 2016 11:30:12 +0200 Tim Rühsen <[email protected]> wrote:
> On Mittwoch, 10. August 2016 06:23:49 CEST Matthew White wrote: > > On Tue, 09 Aug 2016 21:25:16 +0200 > > > > Tim Rühsen <[email protected]> wrote: > > > On Freitag, 5. August 2016 20:25:06 CEST Matthew White wrote: > > > > On Thu, 04 Aug 2016 16:47:18 +0200 > > > > > > > > Tim Ruehsen <[email protected]> wrote: > > > > > On Wednesday, August 3, 2016 1:46:56 PM CEST Matthew White wrote: > > > > > > On Tue, 02 Aug 2016 11:27:08 +0200 > > > > > > > > > > > > Tim Ruehsen <[email protected]> wrote: > > > > > > > On Tuesday, August 2, 2016 10:06:42 AM CEST Matthew White wrote: > > > > > > > > On Sat, 30 Jul 2016 21:23:56 +0200 > > > > > > > > > > > > > > > > Matthew White <[email protected]> wrote: > > > > > > > > > Hello! > > > > > > > > > > > > > > > > > > I noticed that wget doesn't use the metalink:file element as > > > > > > > > > the > > > > > > > > > RFC5854 > > > > > > > > > suggests. > > > > > > > > > > > > > > > > > > https://tools.ietf.org/html/rfc5854#section-4.1.2.1 > > > > > > > > > > > > > > > > > > The RFC5854 specifies that the metalink:file could have a > > > > > > > > > "path/file" > > > > > > > > > format. In this case wget should create the "path" tree and > > > > > > > > > save > > > > > > > > > the > > > > > > > > > file > > > > > > > > > as "path/file", but it doesn't. Instead wget saves the file in > > > > > > > > > the > > > > > > > > > working directory. > > > > > > > > > > > > > > > > > > e.g. <file name="dirA/dirB/file.gz"> > > > > > > > > > > > > > > > > > > With this patch wget conforms to the RFC5854. > > > > > > > > > > > > > > > > > > I made this patch working on the following branch: > > > > > > > > > master (latest 20cac2c5ab3d63aacfba35fb10878a2d490e2377) > > > > > > > > > git://git.savannah.gnu.org/wget.git > > > > > > > > > > > > > > > > > > Let me know if this helps. > > > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > After the suggestions of Tim, I fixed the patch (nice! fix the > > > > > > > > fix...). > > > > > > > > So, > > > > > > > > scratch the previous patch and use this one instead. > > > > > > > > > > > > > > > > The function concat_strings() replaces the combination of > > > > > > > > malloc(), > > > > > > > > strlen(), and strcpy(). (Thanks Tim.) > > > > > > > > > > > > > > In this special case (just one string to clone), please use > > > > > > > xstrdup(). > > > > > > > That is much less overhead. > > > > > > > > > > > > > > > The description now follows the style of the other patches. > > > > > > > > (Thanks > > > > > > > > again > > > > > > > > Tim!) > > > > > > > > > > > > > > > > You may consider this patch a Bugfix: > > > > > > > > * src/utils.c (unique_create, fopen_excl): cannot create a > > > > > > > > directory > > > > > > > > tree > > > > > > > > like "path/file". > > > > > > > > > > > > > > > > The problem is that fopen_excl() doesn't create non-existing > > > > > > > > directories. > > > > > > > > What do you suggest? > > > > > > > > > > > > > > IMO, saving metalink files should work 'as usual and expected'. > > > > > > > That > > > > > > > means, all the directory options should apply (see man wget / > > > > > > > Directory > > > > > > > Options). We also have to deal with 'escaping' file name sequences > > > > > > > like > > > > > > > ../ etc. > > > > > > > Not to forget character set conversions. > > > > > > > > > > > > > > What about cases where you download https://myserver/file.tgz, and > > > > > > > in > > > > > > > the > > > > > > > metalink file 'name' is set to 'xyz.doc' ? IMO, we should keep > > > > > > > file.tgz, > > > > > > > except the user stated --trust-server-names. > > > > > > > > > > > > > > Maybe we should set up a a metalink document where we define how > > > > > > > everything > > > > > > > should work including corner cases. Next step would be to set up > > > > > > > appropriate tests and fix the wget metalink code to survive the > > > > > > > tests. > > > > > > > > > > > > > > > Can we make this patch obsolete? > > > > > > > > > > > > > > Basically yes. > > > > > > > But we see, there is still much to do around metalink ... but most > > > > > > > of > > > > > > > the > > > > > > > needed code is already there. Most work will be the definition and > > > > > > > the > > > > > > > tests. > > > > > > > > > > > > > > Regards, Tim > > > > > > > > > > > > Changed to use xstrdup(), new patch attached. > > > > > > > > > > I already mentioned, that taking the metalink filename pulls in a > > > > > security > > > > > issue. It might escape the current directory with a relative or > > > > > absolute > > > > > path. > > > > > > > > Actually, wget aborts if something like "../path/file" is found as > > > > metalink:file name (see attached document as patch). The same goes for > > > > "./path/file" and "/path/file". > > > > > > > > > You might just take the file name without path, but only if the the > > > > > option > > > > > -- trust-server-names has been set. > > > > > > > > > > Tim > > > > > > > > Hi Tim, > > > > > > > > I got your PM, did you get mine (I had an email server problem)? > > > > > > > > I wrote a draft Metalink.README trying to explain the current > > > > interaction > > > > between "Directory Options" and the option '--include-metalink=file' on > > > > the > > > > command line. > > > > > > > > I attach the Metalink.README here as a patch. > > > > > > > > Can you confirm my findings? (Don't rush) > > > > > > I created a new upstream branch 'metalink' where we can put further work > > > for testing. I included your README as doc/metalink.txt and also added > > > two tests for '../File1' and '/File1', which currently fail (the tests > > > expect 'File1' to be written). > > > > > > You can't push there, but maybe you clone it (e.g. to Gitlab, Github or > > > elsewhere) and push your changes to that repo. So I can directly merge > > > from > > > there (and you could then merge from upstream 'metalink'). > > > > > > I still didn't find time to think about the Metalink file naming > > > details... > > > > > > Regards, Tim > > > > Hi Tim, > > > > I forked https://github.com/mirror/wget. > > > > As you suggest, now I'm pushing to the metalink branch on my repo: > > https://github.com/mehw/wget/tree/metalink > > > > As you said, escaping relative and absolute paths from metalink:file names, > > except when the option --trust-server-names is used, seems a good idea. > > I'll work on this ASAP. > > I thought about it when I awoke this night... > > Let's assume the user types basically > wget http://theserver.com/directory/archive.tgz > The server responds with a metalink file (or metalink within request header), > wget retrieves the .meta file, which states the file name > '/the/path/funny.sh'. > > IMO, the (optional) directory options should apply to the URL path of the > command line (http://theserver.com/directory/). > > The downloaded/saved file name should be 'archive.tgz' - except with --trust- > server-names, here the file name should be 'funny.sh' (but still, any > directory > options should apply to the command line URL). > > In general, I would recommend to tell the user when the filename from his URL > and the file name in the Metalink info differ. If --trust-server-names was > not > given, we should give this as hint. That's very good! Something like 'metalink:file name and URL file name are different: <file.meta>'? > > But we should *always* strip the path info from the metalink filename (/the/ > path/funny.sh -> funny.sh), if appropriate (--trust-server-names is given). I think that if you use --input-metalink=<your-own-and-trusted.meta>, the path info could be useful to (re)create a directory tree. But, *never allow absolute paths*, and *always strip './' and '../'* rules shall apply by default, e.g. '/the/../path/./funny.sh' -> 'the/path/funny.sh'. [tell the user when stripping the path and allowing exceptions? WDYT?] Basically, let the user use the metalink path info 'the/path/' and file name 'funny.sh' in the safest way possible when --trust-server-names and adequate directory options are used. > > If there are several files with the same name in the Metalink info and > --trust- > server-names is given, we should apply the standard .1, .2, .3 ... extensions. Yes, enforce the default behaviour. Good. > > WDYT ? > > > Shall I create a new branch or push directly to metalink in my repo? > > Just push directly to your repo. > > Regards, Tim Very well though, Tim! [did you go back to sleep...?] I'm working on a bleeding draft (still need some time, sorry). Soon, Matthew -- Matthew White <[email protected]>
pgpn6U5FYj9T_.pgp
Description: PGP signature
