> for_each_object_in_pack() is a fine way to make sure that you list
> everythning in a pack, but I suspect it is a horrible way to feed a
> list of objects to pack-objects, as it goes by the .idx order, which
> is by definition a way to enumerate objects in a randomly useless
> order.

That's true.

> Do we already have an access to the in-core reverse index for the
> pack at this point in the code?

As far as I can tell, no. These patches construct the list of promisor
objects in repack.c (which does not use the revindex at all), to be
processed by pack-objects in a different process (which does use the
revindex in reuse-delta mode, which is the default). I guess that we
could have access to the revindex if we were to libify pack-objects and
run it in the same process as repack.c or if we were to add a special
mode to pack-objects that reads for itself the list of all the promisor
objects.

> If so, we can enumerate the objects
> in the offset order without incurring a lot of cost (building the
> in-core revindex is the more expensive part).  When writing a pack,
> we try to make sure that related objects are written out close to
> each other [*1*], so listing them in the offset order (with made-up
> pathname information to _force_ that objects that live close
> together in the original pack are grouped together by getting
> similar names) might give us a better than horrible deltification.
> I dunno.

Before I write the NEEDSWORK comment, just to clarify - do you mean
better than horrible object locality? I think deltification still occurs
because pack-objects sorts the input when doing the windowed
deltification, for example:

  git rev-parse HEAD:fetch-pack.c HEAD HEAD^ HEAD^^ \
        HEAD^^^ v2.17.0:fetch-pack.c | git pack-objects --window=2 abc

produces a packfile with a delta, as checked by `verify-pack -v`.

>       Side note *1*: "related" has two axis, and one is relevant
>       for better deltification, while the other is not useful.
>       The one I have in mind here is that we write set of blobs
>       that belong to the same "delta family" together.

As far as I can see, they do not need to be adjacent in the packfile to
have one be a delta against the other.

> I do not think such a "make it a bit better than horrible" is necessary
> in an initial version, but it deserves an in-code NEEDSWORK comment
> to remind us that we need to measure and experiment.

OK, I'll do this in the next reroll.

Reply via email to