James, In consideration of former (CRC) and future (AHS) hashing functions I think it's critical to support extensibility and multiple hashes. I like that XML digsigs use anyURIs <http://www.w3.org/TR/xmldsig-core/#sec-DigestMethod> to identify hashes (e.g. <DigestMethod Algorithm="http://www.w3.org /2000/09/xmldsig#sha1">), but one could argue this unnecessarily complicates what should be a simple syntax.
I was about to propose an IANA registry for hash functions but one already exists (Hash Function Textual Names<http://www.iana.org/assignments/hash-function-text-names/hash-function-text-names.xml> as specified by RFC4572 <http://tools.ietf.org/html/rfc4572#section-8>) so it would make sense to use it rather than inventing our own mechanism - even if we have to update the registry rules to allow for algorithms specified by URI rather than RFC. While Atom is an XML format and should arguably follow XML conventions, there is precedent for prefixing hashes with the name of the hashing function using e.g. colons or curly braces<http://users.ameritech.net/mhwood/ldap-sec-setup.html>. I think it's more important to keep the XML syntax simple and in any case the hash and hash function should be tightly bound as they are useless independently. All that considered, I think the best approach is to allow for a multi-valued "hash" attribute ala: <link rel="alternate" href="http://example.com/" hash="md5:6705f99eccedeac20e969bef954c5fb0 sha-1:bc608e6d3d339d1a7afc406a7ea6a8f07358038b" /> and/or <link rel="alternate" href="http://example.com/thing.pdf" hash="md5:6705f99eccedeac20e969bef954c5fb0" hash="sha-1:bc608e6d3d339d1a7afc406a7ea6a8f07358038b" /> Sam Google On Sat, May 15, 2010 at 1:15 AM, James Snell <[email protected]> wrote: > > Good argument Bob... ok... stewing over this a bit more. I generally > dislike having to do additional parsing of attribute/element values > but there are very good reasons for keeping this as a single "hash" > attribute and you make a compelling case. > > On Fri, May 14, 2010 at 1:26 PM, Bob Wyman <[email protected]> wrote: > > James Snell <[email protected]> wrote: > >> <link href="foo" md5="abc...xyz"> > >> <media:hash algo="GOST">123...456</media:hash> > >> </link> > > > > The alternative approach, which would support both a variety > > and multiplicity of hashes would look like this: > > <link href="foo" hash="gost:123123..., md5:0928402948..., > > sha256:098078097..."/> > > This strikes me as "simpler" than the hybrid approach. Just a few of my > > concerns with the proposed "hybrid" approach follow: > > > > I like binding the algorithm and value together into a single value since > I > > know of no compelling case for processing one element in isolation of the > > other. The hash value only makes sense if you know the algorithm and the > > algorithm is only useful when bound to a specific hash value. Thus, it > > strikes me as simply introducing syntactic sugar to specify the algorithm > > using a different XML component than the value. > > These values are likely to be stored in databases and otherwise > manipulated. > > In all cases, for the data to be meaningful, people will need to keep the > > binding between algorithm and hash value. It is likely that storing a > single > > string value is going to be easier for folk than dealing with a > multi-part > > value. Also, consider the effect of parsers... It is likely that in order > to > > transfer a value from an entry into a database field, what you'll need to > do > > is extract both algorithm and hash value from the parse tree and then > > construct some string that combines them. This would be particularly > useful > > if you want to use the hash value as a database key (a very reasonable > thing > > to do...) You could build and store the string "algo='GOST'>123...455<" > or > > your database might support concatenated fields, or you could build > > "gost.123...456". I think I would go with the latter. > > Defining distinct attributes for each hash algorithm pushes unnecessary > > syntactical complexity to the global level and thus increases the > complexity > > not only of the specification but also of all applications no matter > which > > algorithms they understand or if they understand any at all. It also > makes > > extending the list of supported algorithms "expensive" since such > extensions > > require modification to the standard rather than just an registry > entry.What > > benefit do we get from having these algorithm types defined at the global > > syntax level? > > The hybrid approach looks very complicated to me. It means that I'll have > > two very different places in which hash values might found and two very > > different syntaxes for expressing them. The result is going to be more > > complex code than would otherwise be the case. What value comes from > using > > the hybrid approach? > > One argument for hybrid is that these elements exist already in other > specs. > > I wonder if it isn't possible that those other specs might have > approached > > the problem in a non-optimal fashion. Does it really make sense to import > > syntax if there isn't a really good case that demonstrates that doing so > is > > the best approach? > > I am unaware of any hash algorithms that need anything other than the > > specification of the algorithm and the value in order to be useful. If > there > > were broadly used algorithms that had more complex meta-data > requirements, > > it would be easier to understand the appeal of the hybrid approach. > > I can't think of any reason why it is *useful* to separate the algorithm > > from the hash value. Can someone enlighten me here? What computation, > > storage or communication task becomes easier if you have these two > > separated? > > > > bob wyman > > On Fri, May 14, 2010 at 3:06 PM, James Snell <[email protected]> wrote: > >> > >> Ok, I've been giving this some more thought and I think a hybrid > >> approach works very well. As has been pointed out a number of times in > >> this thread, there are existing elements in other namespaces that > >> provide a algorithm/hash pairing. I think that the Link Extensions > >> Draft can provide a attributes for the most basic hash algorithms and > >> applications that require hash algorithms that are not covered can > >> fall back to the extension elements. > >> > >> e.g. > >> > >> <link href="foo" md5="abc...xyz"> > >> <media:hash algo="GOST">123...456</media:hash> > >> </link> > >> > >> This would allow for the most common cases to be easily covered while > >> allowing for the full range of possible cases to be handled as well. > >> > >> - James > >> > >> On Wed, May 12, 2010 at 8:50 PM, Richard Salz <[email protected]> wrote: > >> >> So the key question is: what are the main algorithms we need to > >> >> provide attributes for? > >> > > >> > This is a hard question to answer -- especially for hash/digest > >> > algorithms > >> > which tend to fall more rapidly than vetted crypto algorithms. > >> > > >> > It's more verbose, but I strongly recommend using a pair of attributes > >> > to > >> > represent algorithm/value. Use the URI's defined in the latest XML > DSIG > >> > document, perhaps with the "trick" that relative URI's ar a shorthand > >> > for > >> > the xmldsig namespace. > >> > > >> > /r$ > >> > > >> > -- > >> > STSM, WebSphere Appliance Architect > >> > https://www.ibm.com/developerworks/mydeveloperworks/blogs/soma/ > >> > > >> > > >> > >> > >> > >> -- > >> - James Snell > >> http://www.snellspace.com > >> [email protected] > >> > > > > > > > > -- > - James Snell > http://www.snellspace.com > [email protected] > >
