Re: [darcs-devel] Checksums?

David Roundy Mon, 11 Dec 2006 10:33:55 -0800

On Mon, Dec 11, 2006 at 11:46:36PM +1100, William Uther wrote:
>   It seems that one potential advantage that Monotone has is that it  
> has really strong data integrity guarantees.  Everything is  
> checksummed.  Darcs has its advantages too, but I'm guessing I don't  
> need to mention them here :).
> 
>   It doesn't seem that Darcs has a similar integrity guarantee  
> (please correct me if I've missed it).  There is 'darcs check' that  
> can "Check the repository for consistency. Check verifies that the  
> patches stored in the repository, when successively applied to an  
> empty tree, properly recreate the stored pristine tree."  But this  
> just checks that the patches and pristine tree are consistent.  It  
> does not check that they are the same as someone else's copies, or  
> that things haven't become corrupted over time.  Darcs also checksums  
> patches (and uses it as part of the patch name - not sure if it is  
> actually used as a checksum), but this doesn't give me the end-to-end  
> guarantees that I'd like.


No, this is a weakness of darcs' approach.  But it's one that's inherent in
darcs' more flexible approach to version control.  Not that we can't have
integrity checking, but it's not easy like it is in monotone's design (or
git's).

>   Adding checksum capability to Darcs is interesting, because the  
> patches can be changed.  I feel this also makes the end-to-end  
> checksum feature more desirable.  One option would be to include a  
> checksum of the pristine tree in each tag patch and/or checkpoint.   
> One could then modify 'darcs check' to have an option to check the  
> stored checksums as those patches are checked.

This connects to the "hashed inventory" work that I'm engaged in, although
that isn't the same.  In that case, we'll be checksumming the actual
contents of patches, which still doesn't give us end-to-end guarantees, but
does allow you to sign a single file to certify the authenticity of a
repository, which is nice.  It also, in the process (since we're talking
about a repo format transition), provides an opportunity we could use to
make tags store a checksum of the pristine tree, as you suggest.  I hadn't
thought of that, but it's a very good idea.

Actually putting pristine hashes in the tags is something we could do
independently, if we wanted.  I was thinking to stick the pristine hashes
in the inventory itself, but if we stuck them in the tag patches (which
would be more challenging in terms of format compatibility) then they'd
naturally move around with the tags, which would be more elegant.  We could
perhaps stick them in with a backward-compatible hash stored in the
"long-comment" section, which could be surpressed with newer darcs and at
least wouldn't confuse older darcs.

End-to-end signatures require patches be signed in a canonical form, which
can be tricky.  It's something people are definitely interested in, but
noone seems to have time for.  It'd be very expensive to compute, but you
wouldn't always need to compute it, only when you're signing a patch or
verifying a signature, which needn't be the default.  Another option would
be to sign a patch bundle, which would be far more efficient, but also
quite a bit uglier as you'd then have to store the entire patch bundle
besides just its signature, while the minimal-context canonical patch can
be computed at will.

>   Might this checksum already be there in the git compatibility  
> work?  If I were to write a patch to add this, would it be accepted?

No, the git stuff doesn't have the sort of checksum you'd like.  It's sort
of there, but really implemented by git.  If you wrote a patch to add
hashes of the pristine state to tags (or hashes for tags stored elsewhere)
I'd vote for accepting it (provided, of course, it's clean).  You might
also (if you have lots of time) try your hand at implementing a hashed
pristine cache, which would store a hash of the pristine state (maybe also
of all the files and directories).  This could be used to avoid needing to
store the pristine cache, and to avoid recomputing it unnecesarily (as
happens when running without a pristine cache).  In particular, we could
include the old and new pristine hashes in a patch bundle, so darcs apply
to a hashed pristine repo wouldn't need to actually create the pristine
cache (although we'd lose the check that the patch actually corresponds to
those two hashed states).

We could also consider (optionally?) storing the hash of pristine even when
pristine is also stored, and using it internally to verify that patches are
in the right context.  I'm thinking of a runtime variant of the existential
type witnesses trick, to verify that we don't try to add a patch to a repo
unless it applies properly.

If you're thinking of a minimal-context hash, that'd probably also be
accepted (we'd very much like to have that), but that's trickier, and in
particular, we tend to be picky about the user interface and repository
format, so you'd best discuss your design here as you develop it.
-- 
David Roundy
Department of Physics
Oregon State University

_______________________________________________
darcs-devel mailing list
[email protected]
http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel

Re: [darcs-devel] Checksums?

Reply via email to