On Wed, Mar 01, 2017 at 12:14:34PM -0800, Linus Torvalds wrote:
> > My biggest concern is the index-pack operation. Try this:
>
> I'm mobile right now, so I can't test, but I'd this perhaps at least partly
> due to the full checksum over the pack-file?
>
> We have two very different uses of SHA1: the actual object name hash, but
> also the sha1file checksums that we do on the index file and the pack files.
>
> And the checksum code really doesn't need the collision checking at all.
I don't think that helps. The sha1 over the pack-file takes about 1.3s
with openssl, and 5s with sha1dc. So we already know the increase there
is only a few seconds, not a few minutes.
And it makes sense if you think about the index-pack operation. It has
to inflate each object, resolving deltas, and checksum the result. And
the number of inflated bytes is _much_ larger than the on-disk bytes.
You can see the difference with:
git cat-file --batch-all-objects \
--batch-check='%(objectsize:disk) %(objectsize)' |
perl -alne '
$disk += $F[0]; $raw += $F[1];
END { print "$disk $raw" }
'
On linux.git that yields:
1210521959 63279680406
That's over a 50x increase in the bytes we have to sha1 for objects
versus pack-checksums.
-Peff