Re: [darcs-users] Which commands can change output of darcs query contents for a particular hash?

Max Battcher Fri, 10 Jul 2009 22:59:24 -0700

Gwern Branwen wrote:

On Fri, Jul 10, 2009 at 6:43 PM, Max Battcher<[email protected]> wrote:

I've got a partial implementation of this myself in my "darcsforge" code.
The pristine object names/hashes as cache keys seems a useful tool for
caching historical data. Dealing with new integration states (er, revisions)
is easy (during a cache walk through pristine), and I was fretting how to
deal with historical integration states but just recently had some ideas
involving context file caching.


I certainly think that modeling revision tracking around pristine.hashed is
the current best way forward. I do feel for those that haven't thrown out
the bathwater and trying to shoehorn such a solution into a traditional
revision number/hash-based design. I think gitit would be a different animal
if it were designed with darcs in mind from the beginning.


I'm just a humble filestore dev, but when I look into pristine.hashed,
aren't I seeing hashes like that filestore/gitit use?

Er... No. I'll try to explain as best I can: Basically, gitit/filestoreright now is using the hashes of individual patches to represent filesat different states in time. In darcs that doesn't quite match actualfile states because patches can be reordered, particularly duringmerges, and may not 1-to-1 correspond to useful/interesting/'real' filestates.

Pristine.hashed files are specific hashes based on files at specificpoints in time with specific contents. Pristine hashes are useful ascache keys because that is essentially what they are: the pristine isdarcs' cache for what git/hg might call HEAD or TIP. The pristine hashesdon't encode any revision information, so they don't work entirely ontheir own, but they serve as the current best cache naming scheme indarcs for storing historical versions of files.

Perhaps an illustration might help... I'll use letters for patches (andtheir hashes) and numbers for pristine hashes. A simple repo might seea few early patches like so:


Patch  /    a.txt  b.txt
A      00   01     02
B      03   01     04
C      05   06     04

(Keeping in mind that in reality the numbers here are UUIDs/hashes, withtimestamps, and during an operation many of the pristine objects listedabove will be ephemeral and garbage collected quickly leaving only themost current (pristine).)

So here we see three patches, one which touches both files a.txt andb.txt (with the containing root directory thrown in for completeness),and two which touch only one file or the other. Across the three patchesyou end up with 7 pristine objects (4 if you ignore the root directoryobject). ``darcs show contents b.txt --match C.hash`` will, for the timebeing, return pristine object 04, but due to patch reordering that maynot always be the case. A simple, similar contrived, that may notnecessarily reflect reality (I'm not actually testing this, merely usingan illustration), would be pulling in patches D and E from a branch thatdoesn't have patch C, you could quite easily end up with:


Patch  /    a.txt  b.txt  c.txt
A      00   01     02
B      03   01     04
D      07   01     08     09
E      0a   01     0b     0c
C      0d   06     0b     0c

At this point, because darcs commuted D and E before C you suddenly get``darcs show contents b.txt --match C.hash``, exactly as before, returnspristine object 0b, not 04! This obviously not what you want in a cachekey, and it can (and will) happen during a push or pull (commutation isa fundamental key in the way darcs operates), not just optimize--reorder. The file states at any given patch hash should not beconsidered idempotent: they can and will change. Patch hashes arecurrently "good enough" for darcsweb and Trac+darcs, and gitit couldprobably keep using them, but don't trust that a file state for a givenpatch hash will always remain the same. (That is, I for one would notuse "lifetime" caching of file states based upon just the file name anda patch hash.)

If you want a glimpse at what I've been working on, during small spurtsof spare time: I've been trying to find the best way(s) to captureinteresting repository states and snapshot pristine as my backing cache.Unfortunately darcs doesn't (yet) make this easy and won't store thisinformation for you; there certainly is no current way to query forspecific 'archived' pristine objects, for instance (other than bycontext file). (...and worse there is no way at all to query foruseful/meaningful historical states other than tags.) So rather than tryto represent both scenarios above by referring to patch C's hash, I'mhoping to have some nice interface to provide something like:


Repo State                                a.txt  b.txt  c.txt  Context
...
2009-06-10 Pulled in first three patches  06     04            [A, B, C]

2009-06-11 Pulled in D and E from branch 06 0b 0c [A, B, D,E, C]

...

Wouldn't you know it, but my "integration log" thing here looks somewhatlike a merge log... I know others have pushed for darcs to store anexplicit merge log before, and maybe it is time to revisit thosediscussions. Certainly we can build such a beast as a thirdpartycomponent and iterate it without it being a necessarily being a part ofdarcs' codebase.


--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users

Re: [darcs-users] Which commands can change output of darcs query contents for a particular hash?

Reply via email to