Re: using git directory cache code in darcs?

David Roundy Sun, 17 Apr 2005 05:22:23 -0700

On Sat, Apr 16, 2005 at 03:43:02PM -0700, Linus Torvalds wrote:
> On Sat, 16 Apr 2005, David Roundy wrote:
> > 1) Would this actually be a good idea? It seems good to me, but there may
> > be other considerations that I haven't thought of.
> 
> I really don't know how well the git index file will work with darcs, and
> the main issue is that the index file names the "stable copy" using the
> sha1 hash. If darcs uses something else (and I imagine it does) you'd
> need to do a fair amount of surgery, and I suspect merging changes won't
> be very easy.


Oh, I'm starting to see (having just browsed the git code for another half
hour or so)... I had been under the (false) impression that the index file
stored the contents of the files themselves, which in retrospect doesn't
make any sense.  So when you run update-cache --add, the file data itself
immediately goes into its final hashed location, and only the sha1 info
goes into the index.

That's all right.  Darcs would only access the cached data through a
git-caching layer, and we've already got an abstraction layer over the
pristine cache.  As long as the git layer can quickly retrieve the contents
of a given file, we should be fine.

The sha1 file and tree hashing isn't direcly useful for darcs, but people
will want to interoperate with git, and for that it would be nice to be
able to know what the hash of a given version is.  I imagine something like

darcs tag --git

which would tag the current version with its git hash.  Of course, to
implement that we only need to reproduce your algorithm for hashing trees,
which probably would be easier to do ourselves without using any git
code... but it would be far faster to recompute with the git backend, since
git stores the hashes of all the unmodified files, and since I also imagine

darcs record --git

which would record a change, and then tag the resulting tree with a git
hash, we might be recomputing the git hashes reasonably often, and we
certainly don't want to rehash the entire kernel each time! :)

> So it might well make sense to wait a bit, until the git thing has calmed
> down some more. For example, I made some rather large changes
> (conceptually, if not in layout of the physical file) to the index file
> just yesterday, since git now uses it for merging too.
> 
> In git, the index file isn't just a speedup, it's the "work" file _and_
> the merge entity. It's not just a floor wax, it's a dessert topping too!

I think that sounds like a pretty reasonable match.  In darcs, there are
internally two main datatypes.  One is the Patch (as you might imagine),
and the other is called a "Slurpy", which is basically a tree lazily
"slurped" into memory.

The pristine cache is then just a way of storing the tree and so we can
"slurp" it again later to retrieve the current state.  So in a sense we'd
be using only one side of the index file interface, the "working directory"
side, where you check files out and add files in--treating it as an
fast filesystem with a few extra-fancy features (like storing inodes of the
files in the working directory).

> I think libgit might make sense, but again, not quite yet. Maybe the new
> merge model was my last smart thought even on the subject of SCM's (I kind
> of hope so), but maybe it's not.
> 
> My gut _feel_ is that the basic git low-level architecture is done, and
> you can certainly start looking around and see if it matches darcs at all. 

Sounds good.  That's sort of the feel I had gotten from other people's
responses as well.  We'll definitely look into how we can use (and
interface with) git.
-- 
David Roundy
http://www.darcs.net
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: using git directory cache code in darcs?

Reply via email to