Gwern Branwen wrote:
On Fri, Jul 10, 2009 at 6:43 PM, Max Battcher<[email protected]> wrote:
I've got a partial implementation of this myself in my "darcsforge" code.
The pristine object names/hashes as cache keys seems a useful tool for
caching historical data. Dealing with new integration states (er, revisions)
is easy (during a cache walk through pristine), and I was fretting how to
deal with historical integration states but just recently had some ideas
involving context file caching.
I certainly think that modeling revision tracking around pristine.hashed is
the current best way forward. I do feel for those that haven't thrown out
the bathwater and trying to shoehorn such a solution into a traditional
revision number/hash-based design. I think gitit would be a different animal
if it were designed with darcs in mind from the beginning.
I'm just a humble filestore dev, but when I look into pristine.hashed,
aren't I seeing hashes like that filestore/gitit use?
Er... No. I'll try to explain as best I can: Basically, gitit/filestore
right now is using the hashes of individual patches to represent files
at different states in time. In darcs that doesn't quite match actual
file states because patches can be reordered, particularly during
merges, and may not 1-to-1 correspond to useful/interesting/'real' file
states.
Pristine.hashed files are specific hashes based on files at specific
points in time with specific contents. Pristine hashes are useful as
cache keys because that is essentially what they are: the pristine is
darcs' cache for what git/hg might call HEAD or TIP. The pristine hashes
don't encode any revision information, so they don't work entirely on
their own, but they serve as the current best cache naming scheme in
darcs for storing historical versions of files.
Perhaps an illustration might help... I'll use letters for patches (and
their hashes) and numbers for pristine hashes. A simple repo might see
a few early patches like so:
Patch / a.txt b.txt
A 00 01 02
B 03 01 04
C 05 06 04
(Keeping in mind that in reality the numbers here are UUIDs/hashes, with
timestamps, and during an operation many of the pristine objects listed
above will be ephemeral and garbage collected quickly leaving only the
most current (pristine).)
So here we see three patches, one which touches both files a.txt and
b.txt (with the containing root directory thrown in for completeness),
and two which touch only one file or the other. Across the three patches
you end up with 7 pristine objects (4 if you ignore the root directory
object). ``darcs show contents b.txt --match C.hash`` will, for the time
being, return pristine object 04, but due to patch reordering that may
not always be the case. A simple, similar contrived, that may not
necessarily reflect reality (I'm not actually testing this, merely using
an illustration), would be pulling in patches D and E from a branch that
doesn't have patch C, you could quite easily end up with:
Patch / a.txt b.txt c.txt
A 00 01 02
B 03 01 04
D 07 01 08 09
E 0a 01 0b 0c
C 0d 06 0b 0c
At this point, because darcs commuted D and E before C you suddenly get
``darcs show contents b.txt --match C.hash``, exactly as before, returns
pristine object 0b, not 04! This obviously not what you want in a cache
key, and it can (and will) happen during a push or pull (commutation is
a fundamental key in the way darcs operates), not just optimize
--reorder. The file states at any given patch hash should not be
considered idempotent: they can and will change. Patch hashes are
currently "good enough" for darcsweb and Trac+darcs, and gitit could
probably keep using them, but don't trust that a file state for a given
patch hash will always remain the same. (That is, I for one would not
use "lifetime" caching of file states based upon just the file name and
a patch hash.)
If you want a glimpse at what I've been working on, during small spurts
of spare time: I've been trying to find the best way(s) to capture
interesting repository states and snapshot pristine as my backing cache.
Unfortunately darcs doesn't (yet) make this easy and won't store this
information for you; there certainly is no current way to query for
specific 'archived' pristine objects, for instance (other than by
context file). (...and worse there is no way at all to query for
useful/meaningful historical states other than tags.) So rather than try
to represent both scenarios above by referring to patch C's hash, I'm
hoping to have some nice interface to provide something like:
Repo State a.txt b.txt c.txt Context
...
2009-06-10 Pulled in first three patches 06 04 [A, B, C]
2009-06-11 Pulled in D and E from branch 06 0b 0c [A, B, D,
E, C]
...
Wouldn't you know it, but my "integration log" thing here looks somewhat
like a merge log... I know others have pushed for darcs to store an
explicit merge log before, and maybe it is time to revisit those
discussions. Certainly we can build such a beast as a thirdparty
component and iterate it without it being a necessarily being a part of
darcs' codebase.
--
--Max Battcher--
http://worldmaker.net
_______________________________________________
darcs-users mailing list
[email protected]
http://lists.osuosl.org/mailman/listinfo/darcs-users