Gwern Branwen wrote:
On Fri, Jul 10, 2009 at 6:43 PM, Max Battcher<[email protected]> wrote:
I've got a partial implementation of this myself in my "darcsforge" code.
The pristine object names/hashes as cache keys seems a useful tool for
caching historical data. Dealing with new integration states (er, revisions)
is easy (during a cache walk through pristine), and I was fretting how to
deal with historical integration states but just recently had some ideas
involving context file caching.

I certainly think that modeling revision tracking around pristine.hashed is
the current best way forward. I do feel for those that haven't thrown out
the bathwater and trying to shoehorn such a solution into a traditional
revision number/hash-based design. I think gitit would be a different animal
if it were designed with darcs in mind from the beginning.

I'm just a humble filestore dev, but when I look into pristine.hashed,
aren't I seeing hashes like that filestore/gitit use?

Er... No. I'll try to explain as best I can: Basically, gitit/filestore right now is using the hashes of individual patches to represent files at different states in time. In darcs that doesn't quite match actual file states because patches can be reordered, particularly during merges, and may not 1-to-1 correspond to useful/interesting/'real' file states.

Pristine.hashed files are specific hashes based on files at specific points in time with specific contents. Pristine hashes are useful as cache keys because that is essentially what they are: the pristine is darcs' cache for what git/hg might call HEAD or TIP. The pristine hashes don't encode any revision information, so they don't work entirely on their own, but they serve as the current best cache naming scheme in darcs for storing historical versions of files.

Perhaps an illustration might help... I'll use letters for patches (and their hashes) and numbers for pristine hashes. A simple repo might see a few early patches like so:

Patch  /    a.txt  b.txt
A      00   01     02
B      03   01     04
C      05   06     04


(Keeping in mind that in reality the numbers here are UUIDs/hashes, with timestamps, and during an operation many of the pristine objects listed above will be ephemeral and garbage collected quickly leaving only the most current (pristine).)

So here we see three patches, one which touches both files a.txt and b.txt (with the containing root directory thrown in for completeness), and two which touch only one file or the other. Across the three patches you end up with 7 pristine objects (4 if you ignore the root directory object). ``darcs show contents b.txt --match C.hash`` will, for the time being, return pristine object 04, but due to patch reordering that may not always be the case. A simple, similar contrived, that may not necessarily reflect reality (I'm not actually testing this, merely using an illustration), would be pulling in patches D and E from a branch that doesn't have patch C, you could quite easily end up with:

Patch  /    a.txt  b.txt  c.txt
A      00   01     02
B      03   01     04
D      07   01     08     09
E      0a   01     0b     0c
C      0d   06     0b     0c

At this point, because darcs commuted D and E before C you suddenly get ``darcs show contents b.txt --match C.hash``, exactly as before, returns pristine object 0b, not 04! This obviously not what you want in a cache key, and it can (and will) happen during a push or pull (commutation is a fundamental key in the way darcs operates), not just optimize --reorder. The file states at any given patch hash should not be considered idempotent: they can and will change. Patch hashes are currently "good enough" for darcsweb and Trac+darcs, and gitit could probably keep using them, but don't trust that a file state for a given patch hash will always remain the same. (That is, I for one would not use "lifetime" caching of file states based upon just the file name and a patch hash.)

If you want a glimpse at what I've been working on, during small spurts of spare time: I've been trying to find the best way(s) to capture interesting repository states and snapshot pristine as my backing cache. Unfortunately darcs doesn't (yet) make this easy and won't store this information for you; there certainly is no current way to query for specific 'archived' pristine objects, for instance (other than by context file). (...and worse there is no way at all to query for useful/meaningful historical states other than tags.) So rather than try to represent both scenarios above by referring to patch C's hash, I'm hoping to have some nice interface to provide something like:

Repo State                                a.txt  b.txt  c.txt  Context
...
2009-06-10 Pulled in first three patches  06     04            [A, B, C]
2009-06-11 Pulled in D and E from branch 06 0b 0c [A, B, D, E, C]
...

Wouldn't you know it, but my "integration log" thing here looks somewhat like a merge log... I know others have pushed for darcs to store an explicit merge log before, and maybe it is time to revisit those discussions. Certainly we can build such a beast as a thirdparty component and iterate it without it being a necessarily being a part of darcs' codebase.

--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Reply via email to