Daniel Carrera <[email protected]> writes: > Ganesh Sittampalam wrote: >> On Fri, 10 Apr 2009, Daniel Carrera wrote: >>> So the order used for the hash is independent of how the patches >>> are arranged in the repository. All that we need is that we agree >>> on which patches we are looking at. So my suggestion only works if >>> the set of dependencies of U is unique up to order. >> >> The dependencies of a patch are unique up to order, and the general >> idea of hashing just U with its dependencies has been discussed >> before (search for phrases like "minimal context"). I can't remember >> what the conclusion was, if any, though. >> >> If you track down the past discussion, it would be worth writing up >> as a wishlist item for the bug tracker if not already there. > > I found a discussion from 2005, starting here: > http://lists.osuosl.org/pipermail/darcs-users/2005-November/008745.html > > Summary: > * David R said it was a good idea. Ideally we'd use some canonical > representation so it'll survive whenever Darcs changes format. > * The cost of verifying a patch would be at worst O(N^2) assuming no > exponentially expensive commutes. If commutation and patch-parsing was > fast, it could probably work fine. > > > I have an optimization idea: The patch could store not just its own > hash, but also the hashes of its immediate dependencies. So Darcs can > skip all of the patches that are not directly relevant. I'll > elaborate: > > Def: By "direct dependency" I mean that patch B depends on A and there > is no patch X such that B depends on X depends on A. > > Imagine we have the following tree: ABCDEFG > > We want to add patch H, which depends on F and B. In turn, F depends > on E and B depends on A, but that's it. Then patch H would say: > > [hash: <hash-of-H>] > [Added feature H. > depends: <hash-of-F> > depends: <hash-of-B> > [email protected]**20090331192735] > > > When Darcs verifies the patch, it skips straight to F and B. When it > gets to F it sees: > > [hash: <hash-of-F>] > [Added feature F. > depends: <hash-of-E> > [email protected]**20090221192735] > > So it goes get patch E as well. Likewise, when it reaches B it finds a > depends:<hash-of-A>' so it grabs A. In this way, Darcs goes straight > to the patches that it needs and doesn't have to spend as much time > figuring out the minimal-context for H. > > The hash for a patch would be the SHA1 sum of the entire patch file, > except for the [hash: ... ] line of course. This means that the hash > of H depends on the hash of F and B, and the hash of F in turn depends > on the hash of E and so on. In this way, the hash for any patch > depends on the entire history of patches needed to produce that > patch's minimal context. > > Once we have a unique hash, we can get digital signatures. So a patch > might actually look like this: > > [hash: <hash-of-H> > sig: <sign-hash-of-H>] > [Added feature H. > depends: <hash-of-F> > depends: <hash-of-B> > [email protected]**20090331192735] > > > What do you think?
I think your scheme works, except that it makes both recording and applying quadratic in the context. Extracting the minimal context takes quadratic time (imagine a patch depending on every other previous patch). Since applying (actually, hash-checking) is going to be quadractic anyway (because we have to reorder the dependences), it might be smarter to compute the minimal context at apply time. [Disclaimer: this is all worst-case complexity, one would have to look at realistic mean-case complexity…] Florent _______________________________________________ darcs-users mailing list [email protected] http://lists.osuosl.org/mailman/listinfo/darcs-users
