Jeff King <p...@peff.net> writes:
> I think the hash here does not collide in that way. It really is just
> the last sixteen characters shoved into a uint32_t.
All bytes overlap with their adjacent byte because they are shifted
by only 2 bits, not 8 bits, when a new byte is brought in. We can
say that the topmost two bits of the result must have come from the
last character, but other than these, there are more than one input
byte for each bit position to be set/unset by, so two names that human
would not consider "similar" would be given the same hash, no?
That is useful for delta code because the code only needs that
similar things are grouped together, it does not mind things that
are not similar is also mixed to a group, as the end result is
primarily determined by similarity of the actual contents, not
What is under topic in this discussion is the other way around; we
know two paths have contents of the same similarity to the third one
and want to tie-break these two using how similar their pathnames
are to the third one.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html