Re: Moved files and merges

Linus Torvalds Mon, 05 Sep 2005 08:48:45 -0700


On Mon, 5 Sep 2005, H. Peter Anvin wrote:
> 
> It would also hade the somewhat interesting possibility that one could 
> "remove and recreate" a file and have it exist as a different entity. 
> That probably needs to be a user option.


It's a totally broken model. Really.

You think it solves issues, but it just creates more bugs and problems 
than it solves.

Trust me. The whole point of git is that "content is the only thing that 
matters", and that there isn't any other meta-data. If you break that 
fundamental assumption, everything git does so well will break. 

I think we've already shown that the "content matters" approach works.  I
claim that the git rename tracking works better than any other SCM out 
there, _exactly_ because it doesn't make the mistake of trying to track 
anything but content.

The "moved + modified files" is not anything special. The current 
automatic merger may not handle it, but that's not because it _can't_ 
handle it, it's because it tries to be simple and efficient. 

And because it's so _incredibly_ fast for all the normal cases, you can 
now spend some effort on figuring out renames dynamically for the few 
cases where it fails. Does it do so now? No. Would adding UUID's help? 
Hell no. It would be just an unmitigated disaster.

Exactly the same way "git-diff-tree" can figure out renames, a merge 
algorithm can figure them out. 

Right now, we have two stages in merges: we try the trivial merge first
(pure "git-read-tree"), and when that fails, we try the automatic 3-way
merge. The fact that we don't have a third (and fourth, and fifth) merge
algorithm for when those two trivial merges happen to not work is _not_ an
indication that the "contents only" approach doesn't work - it's just an
indication of the fact that 99.9% of all merges are trivial, and they
should be optimized for.

So the next step is _not_ to do UUID's, it's to notice that merge errors 
happened, and try to figure out why. Right now we just give up and say 
"sort it out by hand". That's actually a perfectly valid approach even in 
the presense of moved files - it's a bit painful, but once you _do_ sort 
it out and commit the merge, especially if you can push the merge back (so 
that both sides then agree on the final rename), future merges will be 
trivial again - ie you won't have to go through it over and over again.

Of course, if you don't push it back, but keep the two trees separate and 
keep on modifying files that have different names in the other repository, 
you'll keep on getting into the situation that the trivial merge doesn't 
work. So we _do_ want to get an automated "phase 3" (and maybe 4..) merge 
that can figure out renames, but the point here is that it's something we 
_can_ figure out.

For example, one way of doing it is to just do the exact merge we do now,
and then look at the files that didn't merge. Do a cross-diff between such
files and new/deleted files (if not _exactly_ the way we do for "git diff
-M", then at least it's exactly the same concept), and try to do a
three-way merge where the base/first/second pairs don't have the same
name.

For example, let's say that you have the common commit A, and file "x",
and two paths (B and C) where B has renamed the file "x" to "y", and C has
modified file "x". You end up with the schenario that our trivial merge
fails to handle, and right now we give up, and don't help the user very
much at all. But the _solution_ is not to change "read-tree" to know about
renames, nor is it to make git keep any new data. The solution is to just 
make phase 3 say:

 - "Automatic merge failed, trying rename merge"
 - go through all files that exist in C but not in B (or vice versa), and 
   pair them up with all files that exist in B but not in C (or vice
   versa) and see if _they_ can be handled as a three-way merge. And 
   exactly the same way that we do the rename detection, we may want to
   find the "optimal pairing" by looking at the distance between the
   files.

Notice? This will automatically handle the "renamed in one branch, 
modified in another" case. In fact, if the renamer modified it too, that's 
not a problem at all - the three-way merge will work exactly the same way 
it does now with the case of a non-moved "modified in both" files.

Problem solved. Without complicating the trivial (and very common) cases, 
and without introducing any new metadata that is fundamentally impossible 
to maintain (and it _is_ fundamentally impossible to maintain, because it 
has nothing to do with the contents of the files, so patches etc will by 
definition break it).

                Linus
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Moved files and merges

Reply via email to