On Feb 26, 2017, at 1:45 PM, Richard Hipp <d...@sqlite.org> wrote:
> 
> initial implementation will support SHA1 and SHA3-228

224.

> Other hash algorithms may be supported
> in future releases as long as each hash algorithm has a unique hash
> length

That seems brittle.  There are many fewer hash sizes than hash algorithms:

    https://en.wikipedia.org/wiki/List_of_hash_functions

You’re basically encoding a hidden type field here instead of making it 
explicit.

This is why I proposed the MCF and PHC formats: MCF relies on a registry of 
hash IDs, which Fossil could include, and PHC self-documents.

I don’t know how often Fossil queries hashes by doing text searches in P cards 
and such, but the delimiters in PHC and MCF make doing so straightforward: LIKE 
"P %$?1%".  Nearly line noise, but you see what its doing, I trust.

> In Fossil 1.x, there was a 1-to-1 correspondence between hash values
> and artifacts.  Since it supports multiple hash algorithms, Fossil 2.0
> now has a many-to-one relationship between hash values and artifacts,

I don’t see why that must be so.

A given Fossil 2.0 repo may have mixed hash algorithms, but isn’t each artifact 
identified by only one algorithm?  That is, won’t an existing upgraded repo 
have SHA-1 hashes identifying legacy artifacts and K224 hashes identifying 
newer artifacts?  (And maybe later, K256 hashes identifying Fossil 2.x 
artifacts where x > 0?)

Why can’t you continue to use blob.uuid for the hash, and maybe add your “alg” 
and “aux” columns to table blob?  It’ll require a primary key change, but that 
can be part of “fossil rebuild.”

Surely you aren’t suggesting that all new checkins be multiply-hashed using all 
supported algorithms, simply so that you can refer to it via all compiled-in 
hash types?

> The "alg" field will be a numeric 0 for the preferred hash, and some other
> code (yet to be decided) for alternative hashes.

Why not a short string, like “SHA3-224”, or even the more ambitious format 
suggested by PHC, where you can also encode options like algorithm rounds?  

Doesn’t SQLite index such things efficiently, basically into a log(n)-deep tree 
of hashes for each hash type?  Wouldn’t integers create the same B-tree 
structure, only now you have an opaque constant to document somewhere, and a 
registry of IDs to hash definitions to maintain?

> (8) Is it possible for two Fossil servers to sync if they are using
> different preferred hash algorithms?  This is a desired goal, but I
> do not yet understand how hard that will be.

Why is this desired?

I don’t see why this is an important case to solve.  If a given site has a 
Fossil server stuck on Fossil 1.x and thus on SHA-1, the Fossil 2.x servers 
syncing with it can be configured to use SHA-1 only for compatibility.  Problem 
solved.

To move to a newer hash, the community around each given repo must agree on a 
flag day, by which time all Fossil executables need to be upgraded to Fossil 
2.x.

> (9) Can a Fossil 1.x client push/pull/clone from a Fossil 2.0 server,
> assuming the repository uses SHA1 has it preferred hash algorithm?
> This is desirable, but I am willing to sacrifice this capability in
> order to reduce complexity.

Agreed as far as it goes, but consider how willing you’ll be to backport Fossil 
2 features to Fossil 1 if you don’t design in this capability.

That is, if some non-hash-related feature lands in Fossil 2, and it solves a 
given user’s problem, are you going to insist that they upgrade to Fossil 2 to 
get it, or will you backport it to Fossil 1 to placate them?

It’s one thing to be stuck with a whole bunch of Fossil 1.29 clients used by 
Debian Jessie users who refuse to use anything but what’s in the package repo. 
It’s quite another to be unable to upgrade the servers as well because that’ll 
break all the clients.

> (10) Should Keccak hashes that are not part of the SHA3 standard
> (example: Keccak[196]) be supported?

Yes, including SHA3-160, and that’s why you shouldn’t do length-based 
detection. :)

Separately from all of the above, I don’t see that your proposal addresses 
Joerg Sonnenberg’s concerns: different hash values for the same data in 
different contexts (ticket, commit, wiki, etc.), size somehow included in the 
hash, etc.
_______________________________________________
fossil-dev mailing list
fossil-dev@mailinglists.sqlite.org
http://mailinglists.sqlite.org/cgi-bin/mailman/listinfo/fossil-dev

Reply via email to