On Wed, Aug 08, 2018 at 04:12:10PM -0700, Jonathan Tan wrote:
> Many invocations of for_each_object_in_pack() and
> for_each_packed_object() (which invokes the former) subsequently check
> at least the type of the packed object, necessitating accessing the
> packfile itself. For locality reasons, it is thus better to iterate in
> pack order, instead of index order. Teach for_each_object_in_pack() to
> iterate in pack order by first creating a reverse index.
>
> This is based off work by Jeff King.
>
> Signed-off-by: Jonathan Tan <[email protected]>
> ---
> After writing this patch and looking at it further, I'm not sure if this
> is a clear benefit, but here's the patch anyway. In particular,
> builtin/fsck.c and builtin/cat-file.c just deal with the OID directly
> and does not access the packfile at all (at least at the time of
> invoking for_each_packed_object). And revision.c, if we are excluding
> promisor objects, parses each packed promisor object, but it seems
> possible to avoid doing that (replacing the parse_object() by
> lookup_unknown_object() still passes tests).
Even if you just use the oid to do a separate lookup in the object
database, there's still a benefit in accessing the objects in pack
order. The case in cat-file needs more than this, though, since it
separately sorts the output (it has to, because it has to merge and
de-dup the output from several packs plus loose objects).
With the patch below on top of yours, I get:
$ time git.v2.18.0 cat-file --batch-all-objects --buffer --batch | wc -c
6938365964
real 0m44.686s
user 0m42.932s
sys 0m5.283s
$ time git.compile cat-file --batch-all-objects --buffer --batch | wc -c
8289859070
real 0m7.007s
user 0m5.542s
sys 0m4.005s
But:
- it needs to de-duplicate using a hashmap (which is why the output is
so much bigger in the second case)
- it probably needs to be enabled explicitly by the user, since
cat-file is plumbing and callers may be relying on the existing sort
order
I can try to pick this up and carry the cat-file bits to completion if
you want, but probably not until tomorrow or Friday.
-Peff