On Tue, Sep 2, 2008 at 8:23 AM, Ian Macfarlane <[EMAIL PROTECTED]> wrote: > Dear Anthony, > > One extra alteration where I think the wording could be slightly tidied up: > > "When one or more metalink:url elements have a preference attribute > value of "100", other metalink:url elements SHOULD NOT be used, > unless these cannot be processed (e.g. are "bittorrent" etc, and this > is not supported by the Metalink Processor, or the servers are down)." > > Here "these" could potentially be misread as the "non-100" elements > rather than the "100" elements. I think slightly clarifying the > wording here would be beneficial. I suggest something along the lines > of: > > "When one or more metalink:url elements have a preference attribute > value of "100", other metalink:url elements SHOULD NOT be used, unless > the elements with a preference of 100 cannot be processed (e.g. if > they are of a type which is not supported by the Metalink Processor, > such as bittorrent, or if the servers are unavailable)."
Changed. > Also, I still think the "type" definitions such as "http", "https" etc > should be removed, as per the reasons given in the previous emails. It looks like you're getting your way. :) You & others have made good points against it. > Thank you for taking the time to look at my suggestions. Thanks again for taking the time to make them. Your comments were the first & very helpful! > ps: I have not had a chance to look through the entire revised > document - these comments are based on the revisions described in your > email. > > 2008/8/28 Anthony Bryan <[EMAIL PROTECTED]>: >> On Thu, Aug 28, 2008 at 7:14 AM, Ian Macfarlane <[EMAIL PROTECTED]> wrote: >>> Hi Anthony, >>> >>> Thanks for your reply. A few comments about these changes: >> >> Thanks again for the patience & taking the time to help out. >> >> I'm keeping up to date versions at >> http://metalinks.svn.sourceforge.net/viewvc/metalinks/internetdraft/draft-bryan-metalink-01.txt?view=markup >> >>> (1) With regards to this new wording: >>> >>> " 6. The value "bittorrent" signifies that the IRI leads to a >>> BitTorrent .torrent file as specified in [BITTORRENT]. Metalink >>> Processors that do not support BitTorrent should ignore this type >>> and also ignore metalink:url elements which retrieve files that >>> end with the extension ".torrent"." >>> >>> This implies that the file extension still overrides the type >>> attribute even if the type is not "bittorrent" - I might suggest >>> adding to the end: >>> >>> ", unless the metalink:url element has a type attribute which the >>> Metalink Processor supports". >>> >>> It's definitely a real corner case, but it's good to specify the >>> correct behavior for future proofing (what if a new file format comes >>> out called "bittorrent2" which extends bittorrent and uses .torrent >>> files, but which existing "bittorrent1" processors can't handle?) >> >> Yes, better to be clear. I've added it. >> >>> Also my original point regarding the location of the ".torrent" text >>> in the IRI isn't dealt with by this new text - I would suggest >>> explicitly stating that this means when the IRI path ends with the >>> characters ".torrent" (or alternatively, as you suggest, require the >>> "type" attribute for bittorrent). >> >> I think it's good to require the "type" attribute for bittorrent so >> I've changed it. (In practice, it has always been used). This way, as >> you say, FTP or HTTP etc IRIs that don't obviously lead to a torrent >> can be disregarded by Metalink processors, even tho they would be most >> likely by not having a matching file size or the correct hash as the >> other files. >> >>> (2): >>> >>> "What about requiring that "bittorrent" is used as a "type" attribute >>> since .torrent files can be acquired from multiple methods, & just >>> examining the IRI as you mentioned can be misleading?" >>> >>> The difference between the ".torrent" naming issue and the "http://" >>> naming issue is that the first is simply part of the path, and doesn't >>> really mean anything (there's no reason you couldn't serve a web page >>> with a .torrent extension, if you have the right Content-Type). >> >> I didn't think of that. >> >>> However, for "http://" etc, this is the IRI's scheme itself, which has >>> an explicit unalterable meaning. They're really two very different >>> things. >>> >>> (technically ed2k/magnet/rsync URIs don't need a type either, as the >>> scheme provides the required information - it's only the BitTorrent >>> protocol which is different as there is not a 'torrent' URI scheme per >>> se). >>> >>> No strong objection either way to requiring "type" for "bittorrent", >>> so long as any explicit "type" attribute specified overrides any file >>> type "sniffing", but I'm slightly in favour of requiring the type >>> attribute for where it can't be inferred from the scheme and dropping >>> sniffing altogether. >> >> Ok, "type" attribute for bittorrent is required. >> >>> (3): >>> >>> "A Metalink Processor MAY download different segments of a file from >>> more than one IRI simultaneously, and when doing so SHOULD first use >>> the highest priority IRIs and then use lower ones." >>> >>> I agree that this is a difficult one. Some possible suggestions: >>> >>> - When one or more resources have a value of "100", no other resources >>> should be used, unless these cannot be processed (e.g. are bittorrent >>> etc and this is not supported, or the servers are down). >>> >>> - Any resources with a value of "1" should not be used unless all >>> other resources cannot be processed (e.g. are bittorrent etc and this >>> is not supported, or the servers are down). >>> >>> I think at least those two are valuable enough to include (probably a >>> SHOULD). >> >> metalink:url elements MAY have a preference attribute, whose value >> MUST be a number from 1 to 100 for priority, with 100 used first and >> 1 used last. Multiple metalink:url elements can have the same >> preference, i.e. ten mirrors could have preference="100". A Metalink >> Processor MAY download different segments of a file from more than >> one IRI simultaneously, and when doing so SHOULD first use the >> highest priority IRIs and then use lower ones. >> >> When one or more metalink:url elements have a preference attribute >> value of "100", other metalink:url elements SHOULD NOT be used, >> unless these cannot be processed (e.g. are "bittorrent" etc, and this >> is not supported by the Metalink Processor, or the servers are down). >> >> Any metalink:url elements with a preference attribute value of "1" >> SHOULD NOT be used unless all other metalink:url elements cannot be >> processed (e.g. are "bittorrent" etc and this is not supported by the >> Metalink Processor, or the servers are down). >> >>> Lastly, it might be possible to do something based on the 'initial >>> digit', e.g. if the initial digit is higher, all servers with lower >>> digits should not be used (unless the higher ones cannot be >>> processed), and the others should have their work distributed evenly >>> based on the minor digit. For example if you have three resources with >>> preferences of 89, 91 and 95 - the one with 89 would not be used >>> (unless the other two can't be used), and the processor would try and >>> distribute more work to the resource with a value of 95 than the one >>> with 91 (e.g. 5 times more, or something along those lines - or you >>> could leave the exact distribution down to the metalink processor). I >>> think this sort of behavior could be no stronger than a SHOULD though. >> >> This could be interesting, I want to consult the authors of Metalink >> clients first though. >> >>> (4): >>> >>> " In this example, a subdirectory debian-amd64/sarge/ will be created >>> and a file named Contents-amd64.gz will be created inside it. The >>> path MUST be relative. The path MUST NOT begin with a "/" or contain >>> "../" or "./" Metalink Processors MUST NOT allow directory traversal." >>> >>> I think the actual correct form for this should be: >>> >>> " In this example, a subdirectory debian-amd64/sarge/ will be created >>> and a file named Contents-amd64.gz will be created inside it. The >>> path MUST be relative. The path MUST NOT begin with a "/", "./" or >>> "../", contain >>> "/../", or end with "/..". Metalink Processors MUST NOT allow >>> directory traversal." >>> >>> (./ at the start could cause some badly written applications to change >>> to their current directory, but /./ anywhere else should be fine I >>> think). >>> >>> I think it would be good if you could get a second opinion on this >>> wording from someone who knows this a bit better than I. >> >> I've fixed it & hopefully we'll have corrections :) >> >>> (5) It might also be worth adding information as to how to deal with >>> characters which are invalid in the filesystem - I'd suggest something >>> like: >>> >>> "A Metalink Processor MAY alter the name of the subdirectory or file >>> if they contain characters which are invalid in the destination >>> filesystem." >>> >>> (that way it can be left to the processor itself to decide what to >>> rename it to on any particular filesystem, or even reject it if >>> desired). >> >> That sounds good, added. >> >>> (6) "What do you suggest about dealing with multiple hash types?" - >>> obviously it would be better for a processor to check multiple hashes, >>> as it's a good way to prevent malicious altering of the files. This >>> needs to be left down to the metalink processor though. Something >>> like: >>> >>> "When multiple hash types methods are provided, a Metalink Processor >>> MAY verify using more than one of these hash types". >> >> Added. Currently, I think most only do one. >> >>> Also you write: >>> >>> "An issue could be if someone malicious makes a metalink where the MD5 >>> matches that of something published by a legit group, but also >>> includes a SHA-256 checksum, and if clients prefer & only verify >>> SHA-256, then the file could appear to be good even if the downloader >>> looked inside the metalink & compared the MD5 (if the legit group >>> didn't also use & publish SHA-256 checksums too)." >>> >>> That's an interesting case, but if their metalink processor didn't >>> support md5 and only supported SHA-256, it'd be the same scenario too. >>> I don't think this is too much of a concern. It's not unreasonable to >>> change the "MAY" suggested above for checking multiple hash types to a >>> "SHOULD", but for reasons such as performance, this might not always >>> be desirable, and I'm not sure you'd gain as much security as you >>> might think from doing this. Most people don't even know what md5 etc >>> are, and a user who cares about this will probably make sure their >>> metalink processor is one which checks all hashes. >>> >>> It's quite in keeping with standards to write something like this: >>> >>> "Metalink processors are encouraged to check all hash types given >>> which they are able to process" >>> >>> This will probably lead to checking all as the common behavior while >>> not preventing people choosing not to do this for whatever reason. >> >> Yes, I think it ultimately comes down to downloading .metalinks from >> someone you trust, just like any other type of download. >> >>> (7): "is the only thing changed?" - also the indentation changed (e.g. >>> <description> should be indented one more level, as it's a child of >>> <file>). >> >> Hopefully I've got this right now, let me know. >> >>> (8) An additional point I've just noticed - it specifies that it >>> should ignore resources with a "type" of bittorrent if it is >>> unsupported. ("Metalink Processors that do not support BitTorrent MUST >>> ignore" ...) [as changed in your additional email from should]. This >>> should probably be removed from the bittorrent subsection (point 6 in >>> 4.2.17.2), and moved to above the list in 4.2.17.2, and state >>> something like "Metalink Processors that do not support a specified >>> type of resource MUST ignore that resource". This is both future >>> proof, and handles the case of Metalink Processors not supporting ed2k >>> etc. >> >> Good point. >> >>> Best wishes >>> >>> Ian Macfarlane >>> >>> ps: No objections to forwarding any of this to the metalink list. >> >> Thanks! >> >>> 2008/8/28 Anthony Bryan <[EMAIL PROTECTED]>: >>>> Hi Ian, >>>> >>>> Great comments, thank you so much for taking the time to examine this! >>>> These are issues that needed to be addressed. >>>> >>>> Do you mind if I forward this to the metalink-discussion list? >>>> >>>> I'll put the changes here, let me know if they are an improvement, or >>>> suggest a change. >>>> >>>> On Wed, Aug 27, 2008 at 7:33 AM, Ian Macfarlane <[EMAIL PROTECTED]> wrote: >>>>> A few comments regarding the draft at >>>>> http://tools.ietf.org/html/draft-bryan-metalink-00 >>>>> >>>>> (1) With regards to the "type "attribute of the metalink:url element >>>>> in 4.2.17.2, I think it should be made clear that this overrides any >>>>> file extension sniffing specified in 4.2.17. >>>> >>>> 4.2.17.2. The "type" Attribute >>>> >>>> metalink:url elements MAY have a "type" attribute that indicates the >>>> IRI type. The "type" attribute overrides any file extension sniffing >>>> specified above. >>>> >>>>> (2) With regards to the metalink:url element in 4.2.17, it is not >>>>> clear if the IRI must end with ".torrent", or if the path should, e.g. >>>>> does http://example.com/file.torrent?id=1 count? What about >>>>> http://example.com/generate.php?file.torrent >>>> >>>> 6. The value "bittorrent" signifies that the IRI leads to a >>>> BitTorrent .torrent file as specified in [BITTORRENT]. Metalink >>>> Processors that do not support BitTorrent should ignore this type >>>> and also ignore metalink:url elements which retrieve files that >>>> end with the extension ".torrent". >>>> >>>>> (3) Also with regards to the "type "attribute of the metalink:url >>>>> element in 4.2.17.2, it's slightly inconsistent to allow both >>>>> http/https/ftp/etc as well as "bittorrent" as types, as a .torrent >>>>> file itself can be sent over any of these protocols. There is explicit >>>>> information about the protocol from the scheme in these URLs. I would >>>>> suggest "direct" (or omit altogether) for this type of file, and the >>>>> Metalink Processor should infer the protocol from the scheme used. >>>> >>>> What about requiring that "bittorrent" is used as a "type" attribute >>>> since .torrent files can be acquired from multiple methods, & just >>>> examining the IRI as you mentioned can be misleading? >>>> >>>>> (4) With regards to the metalink:url element "preference" attribute in >>>>> 4.2.17.1, it is not entirely clear if a Metalink Processor which can >>>>> download simultaneously should download from two locations where one >>>>> has a lower priority. A comment such as "A Metalink Processor SHOULD >>>>> do xxx" would be helpful. >>>> >>>> I'm not sure what to put here :) >>>> >>>> This sentence would accurately describe what they do now. In our >>>> pre-ID version, we have a "maxconnections" attribute where you can >>>> limit the amount of segments for a download. >>>> >>>> "A Metalink Processor MAY download different segments of a file from >>>> more than one IRI simultaneously, and when doing so SHOULD first use >>>> the highest priority IRIs and then use lower ones." >>>> >>>>> (5) With regards to the "name" attribute in 4.1.3.1, instead of "Only >>>>> relative paths are allowed" I think using the formal restrictive >>>>> language of the standards process is a good idea here, e.g."The path >>>>> MUST be relative". It might also be a good idea to add that the path >>>>> MUST NOT begin with a "/" or contain "../" (and possibly ./") >>>>> [technically this should be starting with ../ or ./ or containing /.. >>>>> or /. I think]. >>>> >>>> In this example, a subdirectory debian-amd64/sarge/ will be created >>>> and a file named Contents-amd64.gz will be created inside it. The >>>> path MUST be relative. The path MUST NOT begin with a "/" or contain >>>> "../" or "./" Metalink Processors MUST NOT allow directory traversal. >>>> >>>>> (6) Under 4.1.4 where it says "This specification assigns no >>>>> significance to the order of metalink:url elements" it might be useful >>>>> to include a reference to the "preference" attribute. >>>> >>>> This specification assigns no significance to the order of metalink: >>>> url elements. Significance is determines by the value of the >>>> "preference" attribute of the metalink:url elements. >>>> >>>>> (7) With regards to verification (4.1.6.1 and 4.2.4.1) there is no >>>>> information as to how a Metalink Processor should deal with one (can >>>>> it ignore it) or deal with multiple hash types (e.g. if there is MD5 >>>>> and SHA1, MUST / MAY / MUST NOT it check more than one?). Also, it >>>>> might be useful to extend metalink documents with new verification >>>>> methods before they arrive in the standard. Perhaps unknown types >>>>> could be allowed here? The same comments mostly apply to digital >>>>> signatures too (4.2.13). >>>> >>>> I agree that unknown hash types or digital signatures should be allowed. >>>> >>>> This document defines nine initial values for hash types. It may be >>>> useful to extend Metalink documents with new verification methods, so >>>> unknown types are allowed. >>>> >>>> and >>>> >>>> metalink:signature elements MUST have a "type" attribute. The inital >>>> value of "type" is the string that is non-empty and matches "pgp". >>>> It may be useful to extend Metalink documents with new types of >>>> digital signatures, so unknown types are allowed. >>>> >>>> >>>> What do you suggest about dealing with multiple hash types? >>>> >>>> An issue could be if someone malicious makes a metalink where the MD5 >>>> matches that of something published by a legit group, but also >>>> includes a SHA-256 checksum, and if clients prefer & only verify >>>> SHA-256, then the file could appear to be good even if the downloader >>>> looked inside the metalink & compared the MD5 (if the legit group >>>> didn't also use & publish SHA-256 checksums too). >>>> - Show quoted text - >>>> >>>>> (8) Formatting nit last - the use of spacing in the nesting of XML >>>>> elements is pretty inconsistent - so instead of this on page 4: >>>>> >>>>> <?xml version="1.0" encoding="UTF-8" ?> >>>>> <metalink version="3.0" xmlns="http://metalinker.org"> >>>>> <published>2008-05-15T12:23:23Z</published> >>>>> <files> >>>>> <file name="example.ext"> >>>>> <description>A description of the example file for download. >>>>> </description> >>>>> <verification> >>>>> <hash type="md5">83b1a04f18d6782cfe0407edadac377f</hash> >>>>> <hash type="sha1">80bc95fd391772fa61c91ed68567f0980bb45fd9 >>>>> </hash> >>>>> </verification> >>>>> <resources> >>>>> <url>ftp://ftp.example.com/example.ext</url> >>>>> <url>http://example.com/example.ext</url> >>>>> <url>http://example.com/example.ext.torrent</url> >>>>> </resources> >>>>> </file> >>>>> </files> >>>>> </metalink> >>>>> >>>>> It would be much nicer if it were nested something like this: >>>>> >>>>> <?xml version="1.0" encoding="UTF-8" ?> >>>>> <metalink version="3.0" xmlns="http://metalinker.org"> >>>>> <published>2008-05-15T12:23:23Z</published> >>>>> <files> >>>>> <file name="example.ext"> >>>>> <description>A description of the example file for >>>>> download.</description> >>>>> <verification> >>>>> <hash type="md5">83b1a04f18d6782cfe0407edadac377f</hash> >>>>> <hash type="sha1">80bc95fd391772fa61c91ed68567f0980bb45fd9 >>>>> </hash> >>>>> </verification> >>>>> <resources> >>>>> <url>ftp://ftp.example.com/example.ext</url> >>>>> <url>http://example.com/example.ext</url> >>>>> <url>http://example.com/example.ext.torrent</url> >>>>> </resources> >>>>> </file> >>>>> </files> >>>>> </metalink> >>>> >>>> Done, if: >>>> >>>> <description>A description of the example file for >>>> download.</description> >>>> >>>> is the only thing changed? >>>> -- (( Anthony Bryan ... Metalink [ http://www.metalinker.org ] )) Easier, More Reliable, Self Healing Downloads _______________________________________________ Int-area mailing list [email protected] https://www.ietf.org/mailman/listinfo/int-area
