On 2/23/2017 4:01 PM, Warren Young wrote:
The PHC scheme would allow Fossil to migrate to something stronger in a
backwards-compatible fashion:
https://github.com/P-H-C/phc-string-format/blob/master/phc-sf-spec.md
That is, if the hash argument in the F, P, and Q cards is not 40 characters and
it has a suitable prefix, it’s a new-style hash, else it’s a legacy SHA-1 hash.
(I’ve previously suggested Modular Crypt Format for this, but PHC has some nice
properties over MCF. See the link.)
Tl;dr: Don't forget about human factors when considering a change.
Should we decide to move to a new hash function, something like PHC is a
decent approach for keeping track of hashes stored internally. But
without some care, it is a usability nightmare, especially at the
command line and in URLs (and wiki markup) where any "long enough"
prefix of the hash serves to identify the target.
One way of keeping that ease of use would be to match user input against
a prefix of just the "hash string". Since PHC specifies that hashes are
Base64 encoded, they are unlikely to collide with any existing SHA1
artifact ids, at least after a reasonable length.
We would expect that most or all artifacts in a single repo would have
the same $id and $parameters, so requiring the user to type them would
be counter-productive. We should permit them, of course, to allow for
explicitly naming a single artifact.
Would we hash with salt? I don't know. If we did, then the salt would
need to remain constant for any particular artifact for the lifetime of
that artifact in that repo (and its clones). The salt could be as simple
as the blob type suggested by Joerg [email today 9:37am], or it could
include something more like a nonce. Using the blob type (perhaps with a
short nonce appended) would get the advantages noted by Joerg when blobs
are ingested (during push, pull, or rebuild); specifically blobs that
smell like manifests but are not can be salted so that they are not
parsed as manifests when ingested.
Who gets to decide which hash should be used in a repo: Just Fossil's
developers? The creator of a repo? The user of a repo? Regardless, I
think we would agree that once a particular artifact is named in a
manifest it cannot change hashes since that would require changing that
manifest, which would change its hash, and so on. But perhaps the next
checkin could use a different hash for some of its content, and to name
its manifest. That would allow preservation of existing names for old
artifacts alongside a new choice of hash functions for naming new
artifacts.
Warren [email today 2:54pm] is right that there are long lead times
between any change we make and its dispersal into the wide universe of
official distros and personal users. That tends to imply that if we
think the threat potential of SHA1 collisions is a concern on the five
year horizon, we need to implement whatever change we decide on soon so
that it is in widespread use *before* the threat is real.
All of that said, should we make a change?
I'm not sure. Switching to a new hash has a non-trivial cost. Storing it
in the PHC style (or inventing our own hash type metadata trick) seems
like the way to mitigate the least expensive part of that cost. The rest
of the cost is in the myriad implementation details and in designing for
best backward compatibility to reduce friction for the user with 100s of
personal repos.
If we do make a change, I would resist the temptation to immediately
rewrite the entire history to use the new hash. Certainly it would be
possible to get a tree of all manifests and work through it replacing
all SHA1 strings with the new hash. But that ignores all the other
places that might have referred to a particular checkin by its SHA1
hash, most obviously wiki markup in wiki pages and technotes but also
all those communicates about a work in progress. "Hey Joe, what happened
in checkin [123456]?" in an email or chat log would now be impossible to
relate back to history.
Perhaps that could be mitigated by tagging each newly rewritten checkin
with the SHA1 hash, possibly inventing a new kind of tag that can be
matched by prefix for the purpose.
-- Ross Berteig r...@cheshireeng.com Cheshire Engineering Corp.
http://www.CheshireEng.com/ +1 626 303 1602
_______________________________________________
fossil-users mailing list
fossil-users@lists.fossil-scm.org
http://lists.fossil-scm.org:8080/cgi-bin/mailman/listinfo/fossil-users