On Thu, 2005-04-14 at 11:11 -0700, Linus Torvalds wrote:
> So, you really need to think of git as a filesystem. You can then 
> implement an SCM _on_top_of_it_, which means that your second suggestion 
> is not only acceptable, it really is the _only_ way to handle this in git:
> 
> > So a commit involving a rename would look something like this...
> > 
> >     tree 82ba574c85e9a2e4652419c88244e9dd1bfa8baa
> >     parent bb95843a5a0f397270819462812735ee29796fb4
> >     rename foo.c bar.c
> >     author David Woodhouse <[EMAIL PROTECTED]> 1113499881 +0100
> >     committer David Woodhouse <[EMAIL PROTECTED]> 1113499881 +0100
> >     Rename foo.c to bar.c and s/foo_/bar_/g
> 
> Except I want that empty line in there, and I want it in the "free-form"  
> section. The "rename" part really isn't part of the git header. It's not 
> what git tracks, it was tracked by an SCM system on top of git.

Note that not only may you have a _set_ of renames, but you'll also have
a _different_ set of renames for each parent. Consider the
representation of a merge where a file was called 'foo' in one parent,
'bar' in the other, and we called it 'foobar' in the resulting tree.

That's the main reason I wanted the renames in with the parent
information -- so it's <parent><rename><rename><...><parent><rename>...

I see your point though and I can't be bothered to argue for the sake of
the slight efficiency benefit we might gain from doing it that way. The
implementation details really aren't that interesting right now.

Let us assume, however, that we have this information somehow stored in
each commit object. It's perfectly sufficient from the POV of the 
'git revtool' which I've been poking at; is it good enough for merges?

Consider a simple case: A branch is taken, file foo.c is renamed to
bar.c, and now we're trying to merge that branch back into the head,
which has moved on. 

We can't just take 'bar.c' as a new file -- we have to track it all the
way back to its inception, and notice that it actually shares a common
ancestor with 'foo.c' in the other parent of the merge.

How feasible, and how computationally expensive, is that task going to
be? Especially given that there may be _many_ new files that we need to
attempt to tie up with their partners, across many potential renames. 

One option for optimising this, if we really need to, might be to track
the file back to its _first_ ancestor and use that as an identification.
The SCM could store that identifier in the blob itself, or we could
consider it an 'inode number' and store it in git's tree objects.

If we can avoid that, however, it would be nice. How feasible is the
merge going to be without it?

-- 
dwmw2


-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to