Jeff King <> writes:

> On Tue, Jul 01, 2014 at 10:08:15AM -0700, Junio C Hamano wrote:
>> I didn't think it through but my gut feeling is that we could change
>> the name similarity score to be the length of the tail part that
>> matches (e.g. 1.a to a/2.a that has the same two bytes at the tail
>> is a better match than to a/2.b that does not share any tail, and to
>> a/1.a that shares the three bytes at the tail is an even better
>> match).
> The delta heuristics in pack-objects use pack_name_hash, which claims:
>         /*
>          * This effectively just creates a sortable number from the
>          * last sixteen non-whitespace characters. Last characters
>          * count "most", so things that end in ".c" sort together.
>          */
> which might be another option (and seems like a superset of the basename
> check, short of basenames that are longer than 16 characters).


I am however not sure if the code to compute similarity score is as
OK with false positives, i.e. dissimilar names that happen to hash
together getting clumped in a same bin or in close bins, as the
existing callers of pack_name_hash().

