On Feb 26, 2017, at 1:45 PM, Richard Hipp <d...@sqlite.org> wrote: > > initial implementation will support SHA1 and SHA3-228
224. > Other hash algorithms may be supported > in future releases as long as each hash algorithm has a unique hash > length That seems brittle. There are many fewer hash sizes than hash algorithms: https://en.wikipedia.org/wiki/List_of_hash_functions You’re basically encoding a hidden type field here instead of making it explicit. This is why I proposed the MCF and PHC formats: MCF relies on a registry of hash IDs, which Fossil could include, and PHC self-documents. I don’t know how often Fossil queries hashes by doing text searches in P cards and such, but the delimiters in PHC and MCF make doing so straightforward: LIKE "P %$?1%". Nearly line noise, but you see what its doing, I trust. > In Fossil 1.x, there was a 1-to-1 correspondence between hash values > and artifacts. Since it supports multiple hash algorithms, Fossil 2.0 > now has a many-to-one relationship between hash values and artifacts, I don’t see why that must be so. A given Fossil 2.0 repo may have mixed hash algorithms, but isn’t each artifact identified by only one algorithm? That is, won’t an existing upgraded repo have SHA-1 hashes identifying legacy artifacts and K224 hashes identifying newer artifacts? (And maybe later, K256 hashes identifying Fossil 2.x artifacts where x > 0?) Why can’t you continue to use blob.uuid for the hash, and maybe add your “alg” and “aux” columns to table blob? It’ll require a primary key change, but that can be part of “fossil rebuild.” Surely you aren’t suggesting that all new checkins be multiply-hashed using all supported algorithms, simply so that you can refer to it via all compiled-in hash types? > The "alg" field will be a numeric 0 for the preferred hash, and some other > code (yet to be decided) for alternative hashes. Why not a short string, like “SHA3-224”, or even the more ambitious format suggested by PHC, where you can also encode options like algorithm rounds? Doesn’t SQLite index such things efficiently, basically into a log(n)-deep tree of hashes for each hash type? Wouldn’t integers create the same B-tree structure, only now you have an opaque constant to document somewhere, and a registry of IDs to hash definitions to maintain? > (8) Is it possible for two Fossil servers to sync if they are using > different preferred hash algorithms? This is a desired goal, but I > do not yet understand how hard that will be. Why is this desired? I don’t see why this is an important case to solve. If a given site has a Fossil server stuck on Fossil 1.x and thus on SHA-1, the Fossil 2.x servers syncing with it can be configured to use SHA-1 only for compatibility. Problem solved. To move to a newer hash, the community around each given repo must agree on a flag day, by which time all Fossil executables need to be upgraded to Fossil 2.x. > (9) Can a Fossil 1.x client push/pull/clone from a Fossil 2.0 server, > assuming the repository uses SHA1 has it preferred hash algorithm? > This is desirable, but I am willing to sacrifice this capability in > order to reduce complexity. Agreed as far as it goes, but consider how willing you’ll be to backport Fossil 2 features to Fossil 1 if you don’t design in this capability. That is, if some non-hash-related feature lands in Fossil 2, and it solves a given user’s problem, are you going to insist that they upgrade to Fossil 2 to get it, or will you backport it to Fossil 1 to placate them? It’s one thing to be stuck with a whole bunch of Fossil 1.29 clients used by Debian Jessie users who refuse to use anything but what’s in the package repo. It’s quite another to be unable to upgrade the servers as well because that’ll break all the clients. > (10) Should Keccak hashes that are not part of the SHA3 standard > (example: Keccak[196]) be supported? Yes, including SHA3-160, and that’s why you shouldn’t do length-based detection. :) Separately from all of the above, I don’t see that your proposal addresses Joerg Sonnenberg’s concerns: different hash values for the same data in different contexts (ticket, commit, wiki, etc.), size somehow included in the hash, etc. _______________________________________________ fossil-dev mailing list fossil-dev@mailinglists.sqlite.org http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev