On Thu, Feb 23, 2017 at 02:38:29PM -0800, Linus Torvalds wrote:
> > Thanks, I hadn't seen that yet. That doesn't look like it should be hard
> > to integrate into Git.
>
> Here's a *very* ugly patch that is absolutely disgusting and should not be
> used. But it does kind of work (I tested it with a faked-up extra patch
> that made git accept the broken pdf as a loose object).
>
> What do I mean by "kind of work"? It uses that ugly and slow checking
> SHA1 routine from the collision detection project for the SHA1 object
> verification, and it means that "git fsck" ends up being about twice as
> slow as it used to be.
Heh. I was just putting the finishing touches on a similar patch. Mine
is much less gross, in that it actually just adds a new USE_SHA1DC knob
(instead of, say, BLK_SHA1).
Here are the timings I came up with:
- compute sha1 over whole packfile
before: 1.349s
after: 5.067s
change: +275%
- rev-list --all
before: 5.742s
after: 5.730s
change: -0.2%
- rev-list --all --objects
before: 33.257s
after: 33.392s
change: +0.4%
- index-pack --verify
before: 2m20s
after: 5m43s
change: +145%
- git log --no-merges -10000 -p
before: 9.532s
after: 9.683s
change: +1.5%
So overall the sha1 computation is about 3-4x slower. But of
course most operations do more than just sha1. Accessing
commits and trees isn't slowed at all (both the +/- changes
there are well within the run-to-run noise). Accessing the
blobs is a little slower, but mostly drowned out by the cost
of things like actually generating patches.
The most-affected operation is `index-pack --verify`, which
is essentially just computing the sha1 on every object. It's
a bit worse than twice as slow, which means every push and
every fetch is going to experience that.
> For example, I suspect we could use our (much cleaner) block-sha1
> implementation and include just the ubc_check.c code with that, instead of
> the truly ugly C sha1 implementation that the sha1collisiondetection
> project uses.
>
> But to do that, somebody would have to really know how the unavoidable
> bit conditions check works with the intermediate hashes. I have only a
> "big picture" mental model of it (read: I'm not competent to do that).
Yeah. I started looking at that, but the ubc check happens after the
initial expansion. But AFAICT, block-sha1 mixes that expansion in with
the rest of the steps for efficiency. So perhaps somebody who really
understands sha1 and the new checks could figure it out, but I'm not at
all certain that adding it in wouldn't lose some of block-sha1's
efficiency (on top of the time to actually do the ubc check).
-Peff