On Fri, Mar 16, 2018 at 04:06:39PM -0400, Jeff King wrote:

> > Furthermore, in order to look at an object it has to be zlib inflated
> > first, and since commit objects tend to be much smaller than trees and
> > especially blobs, there are a lot less bytes to inflate:
> > 
> >   $ grep ^commit type-size |cut -d' ' -f2 |avg
> >   34395730 / 53754 = 639
> >   $ cat type-size |cut -d' ' -f2 |avg
> >   3866685744 / 244723 = 15800
> > 
> > So a simple revision walk inflates less than 1% of the bytes that the
> > "enumerate objects packfiles" approach has to inflate.
> 
> I don't think this is quite accurate. It's true that we have to
> _consider_ every object, but Git is smart enough not to inflate each one
> to find its type. For loose objects we just inflate the header. For
> packed objects, we either pick the type directly out of the packfile
> header (for a non-delta) or can walk the delta chain (without actually
> looking at the data bytes!) until we hit the base.

Hmm, so that's a big part of the problem with this patch series. It
actually _does_ unpack every object with --stdin-packs to get the type,
which is just silly. With the patch below, my time for "commit-graph
write --stdin-packs" on linux.git goes from over 5 minutes (I got bored
and killed it) to 17 seconds.

diff --git a/commit-graph.c b/commit-graph.c
index 6348bab82b..cf1da2e8c1 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -491,11 +491,12 @@ static int add_packed_commits(const struct object_id *oid,
 {
        struct packed_oid_list *list = (struct packed_oid_list*)data;
        enum object_type type;
-       unsigned long size;
-       void *inner_data;
        off_t offset = nth_packed_object_offset(pack, pos);
-       inner_data = unpack_entry(pack, offset, &type, &size);
-       FREE_AND_NULL(inner_data);
+       struct object_info oi = OBJECT_INFO_INIT;
+
+       oi.typep = &type;
+       if (packed_object_info(pack, offset, &oi) < 0)
+               die("unable to get type of object %s", oid_to_hex(oid));
 
        if (type != OBJ_COMMIT)
                return 0;

-Peff

Reply via email to