On 2/23/2017 4:01 PM, Warren Young wrote:
The PHC scheme would allow Fossil to migrate to something stronger in a 
backwards-compatible fashion:

    https://github.com/P-H-C/phc-string-format/blob/master/phc-sf-spec.md

That is, if the hash argument in the F, P, and Q cards is not 40 characters and 
it has a suitable prefix, it’s a new-style hash, else it’s a legacy SHA-1 hash.

(I’ve previously suggested Modular Crypt Format for this, but PHC has some nice 
properties over MCF.  See the link.)

Tl;dr: Don't forget about human factors when considering a change.

Should we decide to move to a new hash function, something like PHC is a decent approach for keeping track of hashes stored internally. But without some care, it is a usability nightmare, especially at the command line and in URLs (and wiki markup) where any "long enough" prefix of the hash serves to identify the target.

One way of keeping that ease of use would be to match user input against a prefix of just the "hash string". Since PHC specifies that hashes are Base64 encoded, they are unlikely to collide with any existing SHA1 artifact ids, at least after a reasonable length.

We would expect that most or all artifacts in a single repo would have the same $id and $parameters, so requiring the user to type them would be counter-productive. We should permit them, of course, to allow for explicitly naming a single artifact.

Would we hash with salt? I don't know. If we did, then the salt would need to remain constant for any particular artifact for the lifetime of that artifact in that repo (and its clones). The salt could be as simple as the blob type suggested by Joerg [email today 9:37am], or it could include something more like a nonce. Using the blob type (perhaps with a short nonce appended) would get the advantages noted by Joerg when blobs are ingested (during push, pull, or rebuild); specifically blobs that smell like manifests but are not can be salted so that they are not parsed as manifests when ingested.

Who gets to decide which hash should be used in a repo: Just Fossil's developers? The creator of a repo? The user of a repo? Regardless, I think we would agree that once a particular artifact is named in a manifest it cannot change hashes since that would require changing that manifest, which would change its hash, and so on. But perhaps the next checkin could use a different hash for some of its content, and to name its manifest. That would allow preservation of existing names for old artifacts alongside a new choice of hash functions for naming new artifacts.

Warren [email today 2:54pm] is right that there are long lead times between any change we make and its dispersal into the wide universe of official distros and personal users. That tends to imply that if we think the threat potential of SHA1 collisions is a concern on the five year horizon, we need to implement whatever change we decide on soon so that it is in widespread use *before* the threat is real.

All of that said, should we make a change?

I'm not sure. Switching to a new hash has a non-trivial cost. Storing it in the PHC style (or inventing our own hash type metadata trick) seems like the way to mitigate the least expensive part of that cost. The rest of the cost is in the myriad implementation details and in designing for best backward compatibility to reduce friction for the user with 100s of personal repos.

If we do make a change, I would resist the temptation to immediately rewrite the entire history to use the new hash. Certainly it would be possible to get a tree of all manifests and work through it replacing all SHA1 strings with the new hash. But that ignores all the other places that might have referred to a particular checkin by its SHA1 hash, most obviously wiki markup in wiki pages and technotes but also all those communicates about a work in progress. "Hey Joe, what happened in checkin [123456]?" in an email or chat log would now be impossible to relate back to history.

Perhaps that could be mitigated by tagging each newly rewritten checkin with the SHA1 hash, possibly inventing a new kind of tag that can be matched by prefix for the purpose.

-- Ross Berteig r...@cheshireeng.com Cheshire Engineering Corp. http://www.CheshireEng.com/ +1 626 303 1602

_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users

Reply via email to