On Thu, Aug 2, 2012 at 4:27 AM, Jeff King <p...@peff.net> wrote:
> Yes, if you go with a commit-based approach, you can do either notes or
> in-commit messages. In other words, I would break the solutions down as:
> 1. Store sha1+sha1 -> score mapping (i.e., what I suggested). This is
> fundamentally a global store, not a per-commit store. For storage,
> you can do one (or a combination) of:
> a. Store the mapping in some local file. Fast, but can't be shared.
> b. Store the mapping in a note (probably indexed by the destination
> blob sha1). Less efficient, but easy to share.
> I implemented (1a). Implementing (1b) would be easy, but for a full-on
> cache (especially for "-C"), I think the resulting size might be
(1a) is good regardless rename overrides. Why don't you polish and
submit it? We can set some criteria to limit the cache size while
keeping computation reasonably low. Caching rename scores for file
pairs that has file size larger than a limit is one. Rename matrix
size could also be a candidate. We could even cache just rename scores
for recent commits (i.e. close to heads) only with the assumption that
people diff/apply recent commits more often.
> All solutions under (2) suffer from the same problem: they are accurate
> only for a single diff. For other diffs, you would either have to not
> use the feature, or you would be stuck traversing the history and
> assigning a temporary file identity (e.g., given commits A->B->C, and in
> A->B we rename "foo" to "bar", the diff between A and C could discover
> that A's "foo" corresponds to C's "bar").
Yeah. If we go with manual overrides, I expect users to deal with
these manually too. IOW they'll need to create a mapping for A->C
themselves. We can help detect that there are manual overrides in some
cases, like merge, and let users know that manual overrides are
ignored. For merge, I think we can just check for all commits while
traversing looking for bases.
> For this reason, I'm not sure that stored overrides like this are
> generally useful in the long run. I think storage is useful for
> _caching_ the results, because it doesn't have to be perfect; it just
> helps with some repetitive queries. Whereas for overriding, I think it
> is much more interesting to override _particular_ diff. E.g., to say "I
> am merging X and Y, and please pretend that Y renamed "foo" to "bar"
> when you do rename detection.
> And in that sense, your "git log" example can be considered a
> special-case of this: you are saying that the diff from $commit to
> $commit^ is done frequently, so rather than saying "please pretend..."
> each time, you would like to store the information forever. And storing
> it in the commit message or a note is one way of doing that.
Yep, specifying rename overrides between two trees is probably better.
> I don't think there's anything fundamentally _wrong_ with that, but I
> kind of question its usefulness. In other words, what is the point in
> doing so? If it is inform the user that semantically the commit did a
> rename, even though the content changed enough that rename detection
> does not find it, then I would argue that you should simply state it in
> the commit message (or in a human-readable git-note, if it was only
> realized after the fact).
> But there is not much point in making it machine-readable, since the
> interesting machine-readable things we do with renames are:
> 1. Show the diff against the rename src, which can often be easier to
> read. Except that if rename detection did not find it, it is
> probably _not_ going to be easier to read.
Probably. Still it helps "git log --follow" to follow the correct
track in the 1% case that rename detection does go wrong.
> 2. Applying content to the destination of a merge. But you're almost
> never doing the diff between a commit and its parent, so the
> information would be useless.
Having a way to interfere rename detection, even manually, could be
good in this case if it reduces conflicts. We could feed rename
overrides using command line.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html