On Fri, Feb 20, 2015 at 1:04 AM, Duy Nguyen <pclo...@gmail.com> wrote:
> On Fri, Feb 20, 2015 at 6:29 AM, Ævar Arnfjörð Bjarmason
> <ava...@gmail.com> wrote:
>> Anecdotally I work on a repo at work (where I'm mostly "the Git guy") that's:
>>
>>  * Around 500k commits
>>  * Around 100k tags
>>  * Around 5k branches
>>  * Around 500 commits/day, almost entirely to the same branch
>>  * 1.5 GB .git checkout.
>>  * Mostly text source, but some binaries (we're trying to cut down[1] on 
>> those)
>
> Would be nice if you could make an anonymized version of this repo
> public. Working on a "real" large repo is better than an artificial
> one.

Yeah, I'll try to do that.

>> But actually most of "git fetch" is spent in the reachability check
>> subsequently done by "git-rev-list" which takes several seconds. I
>
> I wonder if reachability bitmap could help here..

I could have sworn I had that enabled already but evidently not. I did
test it and it cut down on clone times a bit. Now our daily repacking
is:

        git --git-dir={} gc &&
        git --git-dir={} pack-refs --all --prune &&
        git --git-dir={} repack -Ad --window=250 --depth=100
--write-bitmap-index --pack-kept-objects &&

It's not clear to me from the documentation whether this should just
be enabled on the server, or the clients too. In any case I've enabled
it on both.

Even then with it enabled on both a "git pull" that pulls down just
one commit on one branch is 13s. Trace attached at the end of the
mail.

>> haven't looked into it but there's got to be room for optimization
>> there, surely it only has to do reachability checks for new refs, or
>> could run in some "I trust this remote not to send me corrupt data"
>> completely mode (which would make sense within a company where you can
>> trust your main Git box).
>
> No, it's not just about trusting the server side, it's about catching
> data corruption on the wire as well. We have a trick to avoid
> reachability check in clone case, which is much more expensive than a
> fetch. Maybe we could do something further to help the fetch case _if_
> reachability bitmaps don't help.

Still, if that's indeed a big bottleneck what's the worst-case
scenario here? That the local repository gets hosed? The server will
still recursively validate the objects it gets sent, right?

I wonder if a better trade-off in that case would be to skip this in
some situations and instead put something like "git fsck" in a
cronjob.

Here's a "git pull" trace mentioned above:

$ time GIT_TRACE=1 git pull
13:06:13.603781 git.c:555               trace: exec: 'git-pull'
13:06:13.603936 run-command.c:351       trace: run_command: 'git-pull'
13:06:13.620615 git.c:349               trace: built-in: git
'rev-parse' '--git-dir'
13:06:13.631602 git.c:349               trace: built-in: git
'rev-parse' '--is-bare-repository'
13:06:13.636103 git.c:349               trace: built-in: git
'rev-parse' '--show-toplevel'
13:06:13.641491 git.c:349               trace: built-in: git 'ls-files' '-u'
13:06:13.719923 git.c:349               trace: built-in: git
'symbolic-ref' '-q' 'HEAD'
13:06:13.728085 git.c:349               trace: built-in: git 'config'
'branch.trunk.rebase'
13:06:13.738160 git.c:349               trace: built-in: git 'config' 'pull.ff'
13:06:13.743286 git.c:349               trace: built-in: git
'rev-parse' '-q' '--verify' 'HEAD'
13:06:13.972091 git.c:349               trace: built-in: git
'rev-parse' '--verify' 'HEAD'
13:06:14.149420 git.c:349               trace: built-in: git
'update-index' '-q' '--ignore-submodules' '--refresh'
13:06:14.294098 git.c:349               trace: built-in: git
'diff-files' '--quiet' '--ignore-submodules'
13:06:14.467711 git.c:349               trace: built-in: git
'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--'
13:06:14.683419 git.c:349               trace: built-in: git
'rev-parse' '-q' '--git-dir'
13:06:15.189707 git.c:349               trace: built-in: git
'rev-parse' '-q' '--verify' 'HEAD'
13:06:15.335948 git.c:349               trace: built-in: git 'fetch'
'--update-head-ok'
13:06:15.691303 run-command.c:351       trace: run_command: 'ssh'
'git.example.com' 'git-upload-pack '\''/gitrepos/core.git'\'''
13:06:17.095662 run-command.c:351       trace: run_command: 'rev-list'
'--objects' '--stdin' '--not' '--all' '--quiet'
remote: Counting objects: 6, done.
remote: Compressing objects: 100% (6/6), done.
3:06:20.426346 run-command.c:351       trace: run_command:
'unpack-objects' '--pack_header=2,6'
13:06:20.431806 exec_cmd.c:130          trace: exec: 'git'
'unpack-objects' '--pack_header=2,6'
13:06:20.437343 git.c:349               trace: built-in: git
'unpack-objects' '--pack_header=2,6'
remote: Total 6 (delta 0), reused 0 (delta 0)
Unpacking objects: 100% (6/6), done.
13:06:20.444196 run-command.c:351       trace: run_command: 'rev-list'
'--objects' '--stdin' '--not' '--all'
13:06:20.447135 exec_cmd.c:130          trace: exec: 'git' 'rev-list'
'--objects' '--stdin' '--not' '--all'
13:06:20.451283 git.c:349               trace: built-in: git
'rev-list' '--objects' '--stdin' '--not' '--all'
>From ssh://git.example.com/gitrepos/core
   02d33d2..41e72c4  core      -> origin/core
13:06:22.559609 run-command.c:351       trace: run_command: 'gc' '--auto'
13:06:22.562176 exec_cmd.c:130          trace: exec: 'git' 'gc' '--auto'
13:06:22.565661 git.c:349               trace: built-in: git 'gc' '--auto'
13:06:22.594980 git.c:349               trace: built-in: git
'rev-parse' '-q' '--verify' 'HEAD'
13:06:22.845728 git.c:349               trace: built-in: git
'show-branch' '--merge-base' 'refs/heads/core'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
'02d33d2be7f8601c3502fdd89b0946447d7cdf15'
13:06:23.087586 git.c:349               trace: built-in: git 'fmt-merge-msg'
13:06:23.341451 git.c:349               trace: built-in: git
'rev-parse' '--parseopt' '--stuck-long' '--' '--onto'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
13:06:23.350513 git.c:349               trace: built-in: git
'rev-parse' '--git-dir'
13:06:23.362011 git.c:349               trace: built-in: git
'rev-parse' '--is-bare-repository'
13:06:23.365282 git.c:349               trace: built-in: git
'rev-parse' '--show-toplevel'
13:06:23.372589 git.c:349               trace: built-in: git 'config'
'--bool' 'rebase.stat'
13:06:23.377056 git.c:349               trace: built-in: git 'config'
'--bool' 'rebase.autostash'
13:06:23.382102 git.c:349               trace: built-in: git 'config'
'--bool' 'rebase.autosquash'
13:06:23.389458 git.c:349               trace: built-in: git
'rev-parse' '--verify' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0'
13:06:23.608894 git.c:349               trace: built-in: git
'rev-parse' '--verify' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0'
13:06:23.894026 git.c:349               trace: built-in: git
'symbolic-ref' '-q' 'HEAD'
13:06:23.898918 git.c:349               trace: built-in: git
'rev-parse' '--verify' 'HEAD'
13:06:24.102269 git.c:349               trace: built-in: git
'rev-parse' '--verify' 'HEAD'
13:06:24.338636 git.c:349               trace: built-in: git
'update-index' '-q' '--ignore-submodules' '--refresh'
13:06:24.539912 git.c:349               trace: built-in: git
'diff-files' '--quiet' '--ignore-submodules'
13:06:24.729362 git.c:349               trace: built-in: git
'diff-index' '--cached' '--quiet' '--ignore-submodules' 'HEAD' '--'
13:06:24.938533 git.c:349               trace: built-in: git
'merge-base' '41e72c42addc5075e8009a3eebe914fa0ce98b27'
'02d33d2be7f8601c3502fdd89b0946447d7cdf15'
13:06:25.197791 git.c:349               trace: built-in: git 'diff'
'--stat' '--summary' '02d33d2be7f8601c3502fdd89b0946447d7cdf15'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
[details on updated files]
13:06:25.488275 git.c:349               trace: built-in: git
'checkout' '-q' '41e72c42addc5075e8009a3eebe914fa0ce98b27^0'
13:06:26.467413 git.c:349               trace: built-in: git
'update-ref' 'ORIG_HEAD' '02d33d2be7f8601c3502fdd89b0946447d7cdf15'
Fast-forwarded trunk to 41e72c42addc5075e8009a3eebe914fa0ce98b27.
13:06:26.716256 git.c:349               trace: built-in: git 'rev-parse' 'HEAD'
13:06:26.958595 git.c:349               trace: built-in: git
'update-ref' '-m' 'rebase finished: refs/heads/core onto
41e72c42addc5075e8009a3eebe914fa0ce98b27' 'refs/heads/core'
'41e72c42addc5075e8009a3eebe914fa0ce98b27'
'02d33d2be7f8601c3502fdd89b0946447d7cdf15'
13:06:27.205320 git.c:349               trace: built-in: git
'symbolic-ref' '-m' 'rebase finished: returning to refs/heads/core'
'HEAD' 'refs/heads/core'
13:06:27.208748 git.c:349               trace: built-in: git 'gc' '--auto'
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to