I've bisected a performance regression (noticed by Quentin and myself)
which caused a 'git fetch' to go from ~1m30s to ~2m40s:
Author: Jeff King <p...@peff.net>
Date: Mon Jun 30 13:04:03 2014 -0400
prepare_packed_git_one: refactor duplicate-pack check
Reverting this commit from a recent mainline master brings the time back
down from ~2m24s to ~1m19s.
The bisect log:
v2.8.1 -- 2m41s, 2m50s (bad)
v1.9.0 -- 1m39s, 1m46s (good)
188.8.131.522.gea1fd48 -- 2m40s
184.108.40.206.gc285171 -- 2m42s
220.127.116.11.g6753d8a -- 1m27s
18.104.22.1680.g60e2f5a -- 1m34s
22.214.171.1241.gad25da0 -- 2m39s
126.96.36.1995.ge0a064a -- 1m30s
188.8.131.522.g2e42338 -- 2m29s
2.0.0.rc1.32.g5165dd5 -- 1m30s
184.108.40.2067.g5418212 -- 1m32s
220.127.116.11.g6dda4e6 -- 1m28s
18.104.22.1689.g6e40947 -- 2m25s
22.214.171.124.g47bf4b0 -- 2m18s
126.96.36.199.gd6cd00c -- 1m36.542s
However, the commit found by 'git blame' above appears just fine to me,
I haven't been able to spot a bug in it.
A closer inspection reveals the problem to really be that this is an
extremely hot path with more than -- holy cow -- 4,106,756,451
iterations on the 'packed_git' list for a single 'git fetch' on my
repository. I'm guessing the patch above just made the inner loop
ever so slightly slower.
My .git/objects/pack/ has ~2088 files (1042 idx files, 1042 pack files,
and 4 tmp_pack_* files).
I am convinced that it is not necessary to rescan the entire pack
directory 11,348 times or do all 4 _BILLION_ memcmp() calls for a single
'git fetch', even for a large repository like mine.
I could try to write a patch to reduce the number of times we rescan the
pack directory. However, I've never even looked at the file before
today, so any hints regarding what would need to be done would be
(Cced some people with changes in the area.)