On Thu, Aug 28, 2008 at 7:14 AM, Ian Macfarlane <[EMAIL PROTECTED]> wrote: > Hi Anthony, > > Thanks for your reply. A few comments about these changes:
Thanks again for the patience & taking the time to help out. I'm keeping up to date versions at http://metalinks.svn.sourceforge.net/viewvc/metalinks/internetdraft/draft-bryan-metalink-01.txt?view=markup > (1) With regards to this new wording: > > " 6. The value "bittorrent" signifies that the IRI leads to a > BitTorrent .torrent file as specified in [BITTORRENT]. Metalink > Processors that do not support BitTorrent should ignore this type > and also ignore metalink:url elements which retrieve files that > end with the extension ".torrent"." > > This implies that the file extension still overrides the type > attribute even if the type is not "bittorrent" - I might suggest > adding to the end: > > ", unless the metalink:url element has a type attribute which the > Metalink Processor supports". > > It's definitely a real corner case, but it's good to specify the > correct behavior for future proofing (what if a new file format comes > out called "bittorrent2" which extends bittorrent and uses .torrent > files, but which existing "bittorrent1" processors can't handle?) Yes, better to be clear. I've added it. > Also my original point regarding the location of the ".torrent" text > in the IRI isn't dealt with by this new text - I would suggest > explicitly stating that this means when the IRI path ends with the > characters ".torrent" (or alternatively, as you suggest, require the > "type" attribute for bittorrent). I think it's good to require the "type" attribute for bittorrent so I've changed it. (In practice, it has always been used). This way, as you say, FTP or HTTP etc IRIs that don't obviously lead to a torrent can be disregarded by Metalink processors, even tho they would be most likely by not having a matching file size or the correct hash as the other files. > (2): > > "What about requiring that "bittorrent" is used as a "type" attribute > since .torrent files can be acquired from multiple methods, & just > examining the IRI as you mentioned can be misleading?" > > The difference between the ".torrent" naming issue and the "http://" > naming issue is that the first is simply part of the path, and doesn't > really mean anything (there's no reason you couldn't serve a web page > with a .torrent extension, if you have the right Content-Type). I didn't think of that. > However, for "http://" etc, this is the IRI's scheme itself, which has > an explicit unalterable meaning. They're really two very different > things. > > (technically ed2k/magnet/rsync URIs don't need a type either, as the > scheme provides the required information - it's only the BitTorrent > protocol which is different as there is not a 'torrent' URI scheme per > se). > > No strong objection either way to requiring "type" for "bittorrent", > so long as any explicit "type" attribute specified overrides any file > type "sniffing", but I'm slightly in favour of requiring the type > attribute for where it can't be inferred from the scheme and dropping > sniffing altogether. Ok, "type" attribute for bittorrent is required. > (3): > > "A Metalink Processor MAY download different segments of a file from > more than one IRI simultaneously, and when doing so SHOULD first use > the highest priority IRIs and then use lower ones." > > I agree that this is a difficult one. Some possible suggestions: > > - When one or more resources have a value of "100", no other resources > should be used, unless these cannot be processed (e.g. are bittorrent > etc and this is not supported, or the servers are down). > > - Any resources with a value of "1" should not be used unless all > other resources cannot be processed (e.g. are bittorrent etc and this > is not supported, or the servers are down). > > I think at least those two are valuable enough to include (probably a SHOULD). metalink:url elements MAY have a preference attribute, whose value MUST be a number from 1 to 100 for priority, with 100 used first and 1 used last. Multiple metalink:url elements can have the same preference, i.e. ten mirrors could have preference="100". A Metalink Processor MAY download different segments of a file from more than one IRI simultaneously, and when doing so SHOULD first use the highest priority IRIs and then use lower ones. When one or more metalink:url elements have a preference attribute value of "100", other metalink:url elements SHOULD NOT be used, unless these cannot be processed (e.g. are "bittorrent" etc, and this is not supported by the Metalink Processor, or the servers are down). Any metalink:url elements with a preference attribute value of "1" SHOULD NOT be used unless all other metalink:url elements cannot be processed (e.g. are "bittorrent" etc and this is not supported by the Metalink Processor, or the servers are down). > Lastly, it might be possible to do something based on the 'initial > digit', e.g. if the initial digit is higher, all servers with lower > digits should not be used (unless the higher ones cannot be > processed), and the others should have their work distributed evenly > based on the minor digit. For example if you have three resources with > preferences of 89, 91 and 95 - the one with 89 would not be used > (unless the other two can't be used), and the processor would try and > distribute more work to the resource with a value of 95 than the one > with 91 (e.g. 5 times more, or something along those lines - or you > could leave the exact distribution down to the metalink processor). I > think this sort of behavior could be no stronger than a SHOULD though. This could be interesting, I want to consult the authors of Metalink clients first though. > (4): > > " In this example, a subdirectory debian-amd64/sarge/ will be created > and a file named Contents-amd64.gz will be created inside it. The > path MUST be relative. The path MUST NOT begin with a "/" or contain > "../" or "./" Metalink Processors MUST NOT allow directory traversal." > > I think the actual correct form for this should be: > > " In this example, a subdirectory debian-amd64/sarge/ will be created > and a file named Contents-amd64.gz will be created inside it. The > path MUST be relative. The path MUST NOT begin with a "/", "./" or > "../", contain > "/../", or end with "/..". Metalink Processors MUST NOT allow > directory traversal." > > (./ at the start could cause some badly written applications to change > to their current directory, but /./ anywhere else should be fine I > think). > > I think it would be good if you could get a second opinion on this > wording from someone who knows this a bit better than I. I've fixed it & hopefully we'll have corrections :) > (5) It might also be worth adding information as to how to deal with > characters which are invalid in the filesystem - I'd suggest something > like: > > "A Metalink Processor MAY alter the name of the subdirectory or file > if they contain characters which are invalid in the destination > filesystem." > > (that way it can be left to the processor itself to decide what to > rename it to on any particular filesystem, or even reject it if > desired). That sounds good, added. > (6) "What do you suggest about dealing with multiple hash types?" - > obviously it would be better for a processor to check multiple hashes, > as it's a good way to prevent malicious altering of the files. This > needs to be left down to the metalink processor though. Something > like: > > "When multiple hash types methods are provided, a Metalink Processor > MAY verify using more than one of these hash types". Added. Currently, I think most only do one. > Also you write: > > "An issue could be if someone malicious makes a metalink where the MD5 > matches that of something published by a legit group, but also > includes a SHA-256 checksum, and if clients prefer & only verify > SHA-256, then the file could appear to be good even if the downloader > looked inside the metalink & compared the MD5 (if the legit group > didn't also use & publish SHA-256 checksums too)." > > That's an interesting case, but if their metalink processor didn't > support md5 and only supported SHA-256, it'd be the same scenario too. > I don't think this is too much of a concern. It's not unreasonable to > change the "MAY" suggested above for checking multiple hash types to a > "SHOULD", but for reasons such as performance, this might not always > be desirable, and I'm not sure you'd gain as much security as you > might think from doing this. Most people don't even know what md5 etc > are, and a user who cares about this will probably make sure their > metalink processor is one which checks all hashes. > > It's quite in keeping with standards to write something like this: > > "Metalink processors are encouraged to check all hash types given > which they are able to process" > > This will probably lead to checking all as the common behavior while > not preventing people choosing not to do this for whatever reason. Yes, I think it ultimately comes down to downloading .metalinks from someone you trust, just like any other type of download. > (7): "is the only thing changed?" - also the indentation changed (e.g. > <description> should be indented one more level, as it's a child of > <file>). Hopefully I've got this right now, let me know. > (8) An additional point I've just noticed - it specifies that it > should ignore resources with a "type" of bittorrent if it is > unsupported. ("Metalink Processors that do not support BitTorrent MUST > ignore" ...) [as changed in your additional email from should]. This > should probably be removed from the bittorrent subsection (point 6 in > 4.2.17.2), and moved to above the list in 4.2.17.2, and state > something like "Metalink Processors that do not support a specified > type of resource MUST ignore that resource". This is both future > proof, and handles the case of Metalink Processors not supporting ed2k > etc. Good point. > Best wishes > > Ian Macfarlane > > ps: No objections to forwarding any of this to the metalink list. Thanks! > 2008/8/28 Anthony Bryan <[EMAIL PROTECTED]>: >> Hi Ian, >> >> Great comments, thank you so much for taking the time to examine this! >> These are issues that needed to be addressed. >> >> Do you mind if I forward this to the metalink-discussion list? >> >> I'll put the changes here, let me know if they are an improvement, or >> suggest a change. >> >> On Wed, Aug 27, 2008 at 7:33 AM, Ian Macfarlane <[EMAIL PROTECTED]> wrote: >>> A few comments regarding the draft at >>> http://tools.ietf.org/html/draft-bryan-metalink-00 >>> >>> (1) With regards to the "type "attribute of the metalink:url element >>> in 4.2.17.2, I think it should be made clear that this overrides any >>> file extension sniffing specified in 4.2.17. >> >> 4.2.17.2. The "type" Attribute >> >> metalink:url elements MAY have a "type" attribute that indicates the >> IRI type. The "type" attribute overrides any file extension sniffing >> specified above. >> >>> (2) With regards to the metalink:url element in 4.2.17, it is not >>> clear if the IRI must end with ".torrent", or if the path should, e.g. >>> does http://example.com/file.torrent?id=1 count? What about >>> http://example.com/generate.php?file.torrent >> >> 6. The value "bittorrent" signifies that the IRI leads to a >> BitTorrent .torrent file as specified in [BITTORRENT]. Metalink >> Processors that do not support BitTorrent should ignore this type >> and also ignore metalink:url elements which retrieve files that >> end with the extension ".torrent". >> >>> (3) Also with regards to the "type "attribute of the metalink:url >>> element in 4.2.17.2, it's slightly inconsistent to allow both >>> http/https/ftp/etc as well as "bittorrent" as types, as a .torrent >>> file itself can be sent over any of these protocols. There is explicit >>> information about the protocol from the scheme in these URLs. I would >>> suggest "direct" (or omit altogether) for this type of file, and the >>> Metalink Processor should infer the protocol from the scheme used. >> >> What about requiring that "bittorrent" is used as a "type" attribute >> since .torrent files can be acquired from multiple methods, & just >> examining the IRI as you mentioned can be misleading? >> >>> (4) With regards to the metalink:url element "preference" attribute in >>> 4.2.17.1, it is not entirely clear if a Metalink Processor which can >>> download simultaneously should download from two locations where one >>> has a lower priority. A comment such as "A Metalink Processor SHOULD >>> do xxx" would be helpful. >> >> I'm not sure what to put here :) >> >> This sentence would accurately describe what they do now. In our >> pre-ID version, we have a "maxconnections" attribute where you can >> limit the amount of segments for a download. >> >> "A Metalink Processor MAY download different segments of a file from >> more than one IRI simultaneously, and when doing so SHOULD first use >> the highest priority IRIs and then use lower ones." >> >>> (5) With regards to the "name" attribute in 4.1.3.1, instead of "Only >>> relative paths are allowed" I think using the formal restrictive >>> language of the standards process is a good idea here, e.g."The path >>> MUST be relative". It might also be a good idea to add that the path >>> MUST NOT begin with a "/" or contain "../" (and possibly ./") >>> [technically this should be starting with ../ or ./ or containing /.. >>> or /. I think]. >> >> In this example, a subdirectory debian-amd64/sarge/ will be created >> and a file named Contents-amd64.gz will be created inside it. The >> path MUST be relative. The path MUST NOT begin with a "/" or contain >> "../" or "./" Metalink Processors MUST NOT allow directory traversal. >> >>> (6) Under 4.1.4 where it says "This specification assigns no >>> significance to the order of metalink:url elements" it might be useful >>> to include a reference to the "preference" attribute. >> >> This specification assigns no significance to the order of metalink: >> url elements. Significance is determines by the value of the >> "preference" attribute of the metalink:url elements. >> >>> (7) With regards to verification (4.1.6.1 and 4.2.4.1) there is no >>> information as to how a Metalink Processor should deal with one (can >>> it ignore it) or deal with multiple hash types (e.g. if there is MD5 >>> and SHA1, MUST / MAY / MUST NOT it check more than one?). Also, it >>> might be useful to extend metalink documents with new verification >>> methods before they arrive in the standard. Perhaps unknown types >>> could be allowed here? The same comments mostly apply to digital >>> signatures too (4.2.13). >> >> I agree that unknown hash types or digital signatures should be allowed. >> >> This document defines nine initial values for hash types. It may be >> useful to extend Metalink documents with new verification methods, so >> unknown types are allowed. >> >> and >> >> metalink:signature elements MUST have a "type" attribute. The inital >> value of "type" is the string that is non-empty and matches "pgp". >> It may be useful to extend Metalink documents with new types of >> digital signatures, so unknown types are allowed. >> >> >> What do you suggest about dealing with multiple hash types? >> >> An issue could be if someone malicious makes a metalink where the MD5 >> matches that of something published by a legit group, but also >> includes a SHA-256 checksum, and if clients prefer & only verify >> SHA-256, then the file could appear to be good even if the downloader >> looked inside the metalink & compared the MD5 (if the legit group >> didn't also use & publish SHA-256 checksums too). >> - Show quoted text - >> >>> (8) Formatting nit last - the use of spacing in the nesting of XML >>> elements is pretty inconsistent - so instead of this on page 4: >>> >>> <?xml version="1.0" encoding="UTF-8" ?> >>> <metalink version="3.0" xmlns="http://metalinker.org"> >>> <published>2008-05-15T12:23:23Z</published> >>> <files> >>> <file name="example.ext"> >>> <description>A description of the example file for download. >>> </description> >>> <verification> >>> <hash type="md5">83b1a04f18d6782cfe0407edadac377f</hash> >>> <hash type="sha1">80bc95fd391772fa61c91ed68567f0980bb45fd9 >>> </hash> >>> </verification> >>> <resources> >>> <url>ftp://ftp.example.com/example.ext</url> >>> <url>http://example.com/example.ext</url> >>> <url>http://example.com/example.ext.torrent</url> >>> </resources> >>> </file> >>> </files> >>> </metalink> >>> >>> It would be much nicer if it were nested something like this: >>> >>> <?xml version="1.0" encoding="UTF-8" ?> >>> <metalink version="3.0" xmlns="http://metalinker.org"> >>> <published>2008-05-15T12:23:23Z</published> >>> <files> >>> <file name="example.ext"> >>> <description>A description of the example file for >>> download.</description> >>> <verification> >>> <hash type="md5">83b1a04f18d6782cfe0407edadac377f</hash> >>> <hash type="sha1">80bc95fd391772fa61c91ed68567f0980bb45fd9 >>> </hash> >>> </verification> >>> <resources> >>> <url>ftp://ftp.example.com/example.ext</url> >>> <url>http://example.com/example.ext</url> >>> <url>http://example.com/example.ext.torrent</url> >>> </resources> >>> </file> >>> </files> >>> </metalink> >> >> Done, if: >> >> <description>A description of the example file for >> download.</description> >> >> is the only thing changed? >> >> >> The updated ID is draft-bryan-metalink-01pre.txt >> >> http://groups.google.com/group/metalink-discussion/files >> >> http://www.metalinker.org/ID/draft-bryan-metalink-01pre.txt >> >> >> -- >> (( Anthony Bryan ... Metalink [ http://www.metalinker.org ] >> )) Easier, More Reliable, Self Healing Downloads >> > -- (( Anthony Bryan ... Metalink [ http://www.metalinker.org ] )) Easier, More Reliable, Self Healing Downloads _______________________________________________ Int-area mailing list [email protected] https://www.ietf.org/mailman/listinfo/int-area
