On 9/29/2017 7:12 PM, Johannes Schindelin wrote:
Hi Philip,

On Fri, 15 Sep 2017, Philip Oakley wrote:

From: "Johannes Schindelin" <johannes.schinde...@gmx.de>

In light of such experiences, I have to admit that the notion that the
rename detection can always be improved in hindsight puts quite a bit of
insult to injury for those developers who are bitten by it.

Your list made me think that the hints should be directed toward what may be
considered existing solutions for those specific awkward cases.

So the hints could be (by type):
- template;licence;boiler-plate;standard;reference :: copy
- word-rename
- regex for word substitution changes (e.g. which chars are within 'Word-_0`)
- regex for white-space changes (i.e. which chars are considered whitespace.)
- move-dir path/glob spec
- move-file path/glob spec
(maybe list each 'group' of moves, so that once found the rest of the rename
detection follows the group.)

Once the particular hint is detected (path qualified) then the clue/hint is
used to assist in parsing the files to simplify the comparison task and locate
common lines, or common word patterns.

The first example is just a set of alternate terms folk use for the new
duplicate file file case.

The second is a hint that there has been a number of fairly global name
changes in the files. so not only do a word diff but detect & sumarise those
global changes. (your class move example)

The third is the more simple global word changes, based on a limited char set
for a 'word' token list.
The fourth is where we are focussed on the white space part (complementing the
word token viewpoint)

The move hints are lists of path specs that each have distinctly moved.

It may be possible to order the hints as well, so that the detections work in
the right order, giving the heuristics a better chance!

I think my point was: no matter how likely we thought any heuristic rename
detection can be perfected over time, history proved that suspicion
incorrect.

Therefore, it would be good to have a way to tell Git about renames
explicitly so that it does not even need to use its heuristics.

Agreed.

It would be nice if every file (and tree) had a permanent GUID
associated with it.  Then the filename/pathname becomes a property
of the GUIDs.  Then you can exactly know about moves/renames with
minimal effort (and no guessing).  But I suppose that ship has sailed...

Jeff

Reply via email to