Jeff King <p...@peff.net> writes:
> On Tue, Jul 01, 2014 at 10:08:15AM -0700, Junio C Hamano wrote:
>> I didn't think it through but my gut feeling is that we could change
>> the name similarity score to be the length of the tail part that
>> matches (e.g. 1.a to a/2.a that has the same two bytes at the tail
>> is a better match than to a/2.b that does not share any tail, and to
>> a/1.a that shares the three bytes at the tail is an even better
> The delta heuristics in pack-objects use pack_name_hash, which claims:
> * This effectively just creates a sortable number from the
> * last sixteen non-whitespace characters. Last characters
> * count "most", so things that end in ".c" sort together.
> which might be another option (and seems like a superset of the basename
> check, short of basenames that are longer than 16 characters).
I am however not sure if the code to compute similarity score is as
OK with false positives, i.e. dissimilar names that happen to hash
together getting clumped in a same bin or in close bins, as the
existing callers of pack_name_hash().
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html