FYI: The new RFC5854 re: "Metalink Download Description Format" defines a
"hash" element in its section 4.2.4. They have chosen to use the method of
storing the hash algorithm in a different element than the hash value. That
is, in my opinion, unfortunate. I *do not* suggest that we follow their
lead. I'm just posting this here so that folk can be aware of what others
are doing in other contexts.
The relevant text from RFC5854 follows:
4.2.4. The "metalink:hash" Element
The "metalink:hash" element is a Text construct that conveys a
cryptographic hash for a file. All hashes are encoded in lowercase
hexadecimal format. Hashes are used to verify the integrity of a
complete file or portion of a file to determine if the file has been
transferred without any errors.
metalinkHash =
element metalink:hash {
attribute type { text }?,
text
}
Metalink Documents MAY contain one or multiples hashes of a complete
file. metalink:hash elements with a "type" attribute MUST contain a
hash of the complete file. In this example, both SHA-1 and SHA-256
hashes of the complete file are included.
...
<hash type="sha-1">a97fcf6ba9358f8a6f62beee4421863d3e52b080</hash>
<hash type="sha-256">fc87941af7fd7f03e53b34af393f4c14923d74...</hash>
...
Metalink Documents MAY also contain hashes for individual pieces of a
file. metalink:hash elements that are inside a metalink:pieces
container element have a hash for that specific piece or chunk of the
file, and are of the same hash type as the metalink:pieces element in
which they are contained. Metalink Documents MAY contain one or
multiple metalink:pieces container elements, if each "type" attribute
of metalink:pieces has a unique value.
metalink:hash elements without a "type" attribute MUST contain a hash
for that specific piece or chunk of the file and MUST be listed in
the same order as the corresponding pieces appear in the file,
starting at the beginning of the file. The size of the piece is
equal to the value of the "length" attribute of the metalink:pieces
element, apart from the last piece, which is the remainder. See
Section 4.1.3.2 for more information on the size of pieces.
In this example, SHA-1 and SHA-256 hashes of the complete file are
included, along with four SHA-1 piece hashes.
...
<hash type="sha-1">a97fcf6ba9358f8a6f62beee4421863d3e52b080</hash>
<hash type="sha-256">fc87941af7fd7f03e53b34af393f4c14923d74...</hash>
<pieces length="1048576" type="sha-1">
<hash>d96b9a4b92a899c2099b7b31bddb5ca423bb9b30</hash>
<hash>10d68f4b1119014c123da2a0a6baf5c8a6d5ba1e</hash>
<hash>3e84219096435c34e092b17b70a011771c52d87a</hash>
<hash>67183e4c3ab892d3ebe8326b7d79eb62d077f487</hash>
</pieces>
...
On Sun, May 16, 2010 at 11:29 PM, James Snell <[email protected]> wrote:
> Ok, although I seriously dislike having to do additional parsing on
> attribute values, the arguments made so far are valid and parsing hex
> encoded hash digests is -- fortunately -- quite simple to do. So let's
> go with the following syntax...
>
> hash = attribute hash { hash-list }
> hash-list = # ( token ":" 1*HEX )
>
> The token and HEX productions are defined by RFC2616...
>
> The spec would defer to the existing IANA registry for hash functions
> to define the "tokens"
>
> This would result in a syntax of...
>
> hash="md5:abc...xyz, sha-1:123...567, sha-512:xyz...abc"
>
> This seem acceptable to everyone?
>
> - James
>
> On Sat, May 15, 2010 at 11:46 PM, Sam Johnston <[email protected]> wrote:
> > James,
> > In consideration of former (CRC) and future (AHS) hashing functions I
> think
> > it's critical to support extensibility and multiple hashes. I like that
> XML
> > digsigs use anyURIs to identify hashes (e.g. <DigestMethod
> > Algorithm="http://www.w3.org/2000/09/xmldsig#sha1">), but one could
> argue
> > this unnecessarily complicates what should be a simple syntax.
> > I was about to propose an IANA registry for hash functions but one
> already
> > exists (Hash Function Textual Names as specified by RFC4572) so it would
> > make sense to use it rather than inventing our own mechanism - even if we
> > have to update the registry rules to allow for algorithms specified by
> URI
> > rather than RFC.
> > While Atom is an XML format and should arguably follow XML conventions,
> > there is precedent for prefixing hashes with the name of the hashing
> > function using e.g. colons or curly braces. I think it's more important
> to
> > keep the XML syntax simple and in any case the hash and hash function
> should
> > be tightly bound as they are useless independently.
> > All that considered, I think the best approach is to allow for a
> > multi-valued "hash" attribute ala:
> > <link rel="alternate" href="http://example.com/"
> > hash="md5:6705f99eccedeac20e969bef954c5fb0
> > sha-1:bc608e6d3d339d1a7afc406a7ea6a8f07358038b" />
> > and/or
> > <link rel="alternate" href="http://example.com/thing.pdf"
> > hash="md5:6705f99eccedeac20e969bef954c5fb0"
> > hash="sha-1:bc608e6d3d339d1a7afc406a7ea6a8f07358038b" />
> > Sam
> > Google
> > On Sat, May 15, 2010 at 1:15 AM, James Snell <[email protected]> wrote:
> >>
> >> Good argument Bob... ok... stewing over this a bit more. I generally
> >> dislike having to do additional parsing of attribute/element values
> >> but there are very good reasons for keeping this as a single "hash"
> >> attribute and you make a compelling case.
> >>
> >> On Fri, May 14, 2010 at 1:26 PM, Bob Wyman <[email protected]> wrote:
> >> > James Snell <[email protected]> wrote:
> >> >> <link href="foo" md5="abc...xyz">
> >> >> <media:hash algo="GOST">123...456</media:hash>
> >> >> </link>
> >> >
> >> > The alternative approach, which would support both a variety
> >> > and multiplicity of hashes would look like this:
> >> > <link href="foo" hash="gost:123123..., md5:0928402948...,
> >> > sha256:098078097..."/>
> >> > This strikes me as "simpler" than the hybrid approach. Just a few of
> my
> >> > concerns with the proposed "hybrid" approach follow:
> >> >
> >> > I like binding the algorithm and value together into a single value
> >> > since I
> >> > know of no compelling case for processing one element in isolation of
> >> > the
> >> > other. The hash value only makes sense if you know the algorithm and
> the
> >> > algorithm is only useful when bound to a specific hash value. Thus, it
> >> > strikes me as simply introducing syntactic sugar to specify the
> >> > algorithm
> >> > using a different XML component than the value.
> >> > These values are likely to be stored in databases and otherwise
> >> > manipulated.
> >> > In all cases, for the data to be meaningful, people will need to keep
> >> > the
> >> > binding between algorithm and hash value. It is likely that storing a
> >> > single
> >> > string value is going to be easier for folk than dealing with a
> >> > multi-part
> >> > value. Also, consider the effect of parsers... It is likely that in
> >> > order to
> >> > transfer a value from an entry into a database field, what you'll need
> >> > to do
> >> > is extract both algorithm and hash value from the parse tree and then
> >> > construct some string that combines them. This would be particularly
> >> > useful
> >> > if you want to use the hash value as a database key (a very reasonable
> >> > thing
> >> > to do...) You could build and store the string
> "algo='GOST'>123...455<"
> >> > or
> >> > your database might support concatenated fields, or you could build
> >> > "gost.123...456". I think I would go with the latter.
> >> > Defining distinct attributes for each hash algorithm pushes
> unnecessary
> >> > syntactical complexity to the global level and thus increases the
> >> > complexity
> >> > not only of the specification but also of all applications no matter
> >> > which
> >> > algorithms they understand or if they understand any at all. It also
> >> > makes
> >> > extending the list of supported algorithms "expensive" since such
> >> > extensions
> >> > require modification to the standard rather than just an registry
> >> > entry.What
> >> > benefit do we get from having these algorithm types defined at the
> >> > global
> >> > syntax level?
> >> > The hybrid approach looks very complicated to me. It means that I'll
> >> > have
> >> > two very different places in which hash values might found and two
> very
> >> > different syntaxes for expressing them. The result is going to be more
> >> > complex code than would otherwise be the case. What value comes from
> >> > using
> >> > the hybrid approach?
> >> > One argument for hybrid is that these elements exist already in other
> >> > specs.
> >> > I wonder if it isn't possible that those other specs might have
> >> > approached
> >> > the problem in a non-optimal fashion. Does it really make sense to
> >> > import
> >> > syntax if there isn't a really good case that demonstrates that doing
> so
> >> > is
> >> > the best approach?
> >> > I am unaware of any hash algorithms that need anything other than the
> >> > specification of the algorithm and the value in order to be useful. If
> >> > there
> >> > were broadly used algorithms that had more complex meta-data
> >> > requirements,
> >> > it would be easier to understand the appeal of the hybrid approach.
> >> > I can't think of any reason why it is *useful* to separate the
> algorithm
> >> > from the hash value. Can someone enlighten me here? What computation,
> >> > storage or communication task becomes easier if you have these two
> >> > separated?
> >> >
> >> > bob wyman
> >> > On Fri, May 14, 2010 at 3:06 PM, James Snell <[email protected]>
> wrote:
> >> >>
> >> >> Ok, I've been giving this some more thought and I think a hybrid
> >> >> approach works very well. As has been pointed out a number of times
> in
> >> >> this thread, there are existing elements in other namespaces that
> >> >> provide a algorithm/hash pairing. I think that the Link Extensions
> >> >> Draft can provide a attributes for the most basic hash algorithms and
> >> >> applications that require hash algorithms that are not covered can
> >> >> fall back to the extension elements.
> >> >>
> >> >> e.g.
> >> >>
> >> >> <link href="foo" md5="abc...xyz">
> >> >> <media:hash algo="GOST">123...456</media:hash>
> >> >> </link>
> >> >>
> >> >> This would allow for the most common cases to be easily covered while
> >> >> allowing for the full range of possible cases to be handled as well.
> >> >>
> >> >> - James
> >> >>
> >> >> On Wed, May 12, 2010 at 8:50 PM, Richard Salz <[email protected]>
> wrote:
> >> >> >> So the key question is: what are the main algorithms we need to
> >> >> >> provide attributes for?
> >> >> >
> >> >> > This is a hard question to answer -- especially for hash/digest
> >> >> > algorithms
> >> >> > which tend to fall more rapidly than vetted crypto algorithms.
> >> >> >
> >> >> > It's more verbose, but I strongly recommend using a pair of
> >> >> > attributes
> >> >> > to
> >> >> > represent algorithm/value. Use the URI's defined in the latest XML
> >> >> > DSIG
> >> >> > document, perhaps with the "trick" that relative URI's ar a
> shorthand
> >> >> > for
> >> >> > the xmldsig namespace.
> >> >> >
> >> >> > /r$
> >> >> >
> >> >> > --
> >> >> > STSM, WebSphere Appliance Architect
> >> >> > https://www.ibm.com/developerworks/mydeveloperworks/blogs/soma/
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> - James Snell
> >> >> http://www.snellspace.com
> >> >> [email protected]
> >> >>
> >> >
> >> >
> >>
> >>
> >>
> >> --
> >> - James Snell
> >> http://www.snellspace.com
> >> [email protected]
> >>
> >
> >
>