On Tue, Oct 16, 2012 at 12:15:21PM +0700, Nguyen Thai Ngoc Duy wrote:

> On Tue, Oct 16, 2012 at 11:51 AM, Jeff King <p...@peff.net> wrote:
> >> Its worth nothing that a SHA-1 collision can be identified at the
> >> server because the server performs a byte-for-byte compare of both
> >> copies of the object to make sure they match exactly in every way. Its
> >> not fast, but its safe. :-)
> >
> > Do we? I thought early versions of git did that, but we did not
> > double-check collisions any more for performance reasons. You don't
> > happen to remember where that code is, do you (not that it really
> > matters, but I am just curious)?
> We do. I touched that sha-1 collision code last time I updated
> index-pack, to support large blobs. We only do that when we receive an
> object that we already have, which should not happen often unless
> you're under attack, so little performance impact normally. Search
> "collision" in index-pack.c

Ah, thanks, I remember this now. I think that I was thinking of the very
early code to check every sha1 file write. E.g., the code killed off by
aac1794 (Improve sha1 object file writing., 2005-05-03). But that is
ancient history that is not really relevant.

Interesting that we check only in index-pack. If the pushed content is
small enough, we will call unpack-objects. That follows the usual code
path for writing the object, which will prefer the existing copy.

I suspect a site that is heavy on alternates is invoking the index-pack
code path more frequently than necessary (e.g., history gets pushed to
one forked repo, then when it goes to the next one, we may not share the
ref that tells the client we already have the object and receive it a
second time).

To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to