On Mon, Dec 11, 2006 at 11:46:36PM +1100, William Uther wrote: > It seems that one potential advantage that Monotone has is that it > has really strong data integrity guarantees. Everything is > checksummed. Darcs has its advantages too, but I'm guessing I don't > need to mention them here :). > > It doesn't seem that Darcs has a similar integrity guarantee > (please correct me if I've missed it). There is 'darcs check' that > can "Check the repository for consistency. Check verifies that the > patches stored in the repository, when successively applied to an > empty tree, properly recreate the stored pristine tree." But this > just checks that the patches and pristine tree are consistent. It > does not check that they are the same as someone else's copies, or > that things haven't become corrupted over time. Darcs also checksums > patches (and uses it as part of the patch name - not sure if it is > actually used as a checksum), but this doesn't give me the end-to-end > guarantees that I'd like.
No, this is a weakness of darcs' approach. But it's one that's inherent in darcs' more flexible approach to version control. Not that we can't have integrity checking, but it's not easy like it is in monotone's design (or git's). > Adding checksum capability to Darcs is interesting, because the > patches can be changed. I feel this also makes the end-to-end > checksum feature more desirable. One option would be to include a > checksum of the pristine tree in each tag patch and/or checkpoint. > One could then modify 'darcs check' to have an option to check the > stored checksums as those patches are checked. This connects to the "hashed inventory" work that I'm engaged in, although that isn't the same. In that case, we'll be checksumming the actual contents of patches, which still doesn't give us end-to-end guarantees, but does allow you to sign a single file to certify the authenticity of a repository, which is nice. It also, in the process (since we're talking about a repo format transition), provides an opportunity we could use to make tags store a checksum of the pristine tree, as you suggest. I hadn't thought of that, but it's a very good idea. Actually putting pristine hashes in the tags is something we could do independently, if we wanted. I was thinking to stick the pristine hashes in the inventory itself, but if we stuck them in the tag patches (which would be more challenging in terms of format compatibility) then they'd naturally move around with the tags, which would be more elegant. We could perhaps stick them in with a backward-compatible hash stored in the "long-comment" section, which could be surpressed with newer darcs and at least wouldn't confuse older darcs. End-to-end signatures require patches be signed in a canonical form, which can be tricky. It's something people are definitely interested in, but noone seems to have time for. It'd be very expensive to compute, but you wouldn't always need to compute it, only when you're signing a patch or verifying a signature, which needn't be the default. Another option would be to sign a patch bundle, which would be far more efficient, but also quite a bit uglier as you'd then have to store the entire patch bundle besides just its signature, while the minimal-context canonical patch can be computed at will. > Might this checksum already be there in the git compatibility > work? If I were to write a patch to add this, would it be accepted? No, the git stuff doesn't have the sort of checksum you'd like. It's sort of there, but really implemented by git. If you wrote a patch to add hashes of the pristine state to tags (or hashes for tags stored elsewhere) I'd vote for accepting it (provided, of course, it's clean). You might also (if you have lots of time) try your hand at implementing a hashed pristine cache, which would store a hash of the pristine state (maybe also of all the files and directories). This could be used to avoid needing to store the pristine cache, and to avoid recomputing it unnecesarily (as happens when running without a pristine cache). In particular, we could include the old and new pristine hashes in a patch bundle, so darcs apply to a hashed pristine repo wouldn't need to actually create the pristine cache (although we'd lose the check that the patch actually corresponds to those two hashed states). We could also consider (optionally?) storing the hash of pristine even when pristine is also stored, and using it internally to verify that patches are in the right context. I'm thinking of a runtime variant of the existential type witnesses trick, to verify that we don't try to add a patch to a repo unless it applies properly. If you're thinking of a minimal-context hash, that'd probably also be accepted (we'd very much like to have that), but that's trickier, and in particular, we tend to be picky about the user interface and repository format, so you'd best discuss your design here as you develop it. -- David Roundy Department of Physics Oregon State University _______________________________________________ darcs-devel mailing list [email protected] http://www.abridgegame.org/cgi-bin/mailman/listinfo/darcs-devel
