On Thu, Sep 20, 2012 at 11:20:31PM -0400, Cristian Tibirna wrote:

> Running the script in attachment produces a git repository in which were 
> operated a large number of file renames, in which many of the renamed files 
> (in this particular case all) have the same content but different names.
> The commit data from the renaming operation (last commit in the script-
> generated history) is inexactly rendered by the command 
> git diff-tree -r -C master
> The logical result is correctly produced by the more restricted command
> git diff-tree -r -M master
> IMO for this particular last commit both the above commands should return the 
> same result.

Interesting. I get the same results from both commands. But I did have
to munge your script, as my "rename" command does not seem to work like
the one you expect in your script. So I may have misinterpreted the
intent of it.

However, I would not be surprised if one could conduct a situation in
which "-C" and "-M" produced different results. Since the content of all
the files is the same, git has to make a guess about which files match
up based on their filenames. The current heuristic is very stupid and
just tries to match basenames (e.g., moving "foo/Makefile" to
"bar/Makefile" is a better match than moving the same content to
"bar/foo.c"). But in this case, the basenames don't match at all.

By using "-C", we will typically have more rename sources available, and
we may therefore process the possible pairs in a different order. Since
our name heuristic is largely useless, our results depend on that order.

I think the real solution is to improve the name heuristic. Something
like an edit distance would make more sense (though I think it is not as
simple as an edit distance across the whole pathname, as moving a
basename across directories should probably be preferred to changing the
filename inside a directory).

Largely I think nobody has cared much because this only comes up when
you move multiple identical files. Quite often there is a minor
difference even between very similar files, and that is enough to come
up with sane results.

