On Fri, Feb 02, 2018 at 08:58:52PM -0500, Derrick Stolee wrote:

> I don't think pairing this with pack-objects or index-pack is a good
> direction, because the commit graph is not locked into a packfile the way
> the bitmap is. In fact, the entire ODB could be replaced independently and
> the graph is still valid (the commits in the graph may no longer have their
> paired commits in the ODB due to a GC; you should never navigate to those
> commits without having a ref pointing to them, so this is not immediately a
> problem).

One advantage of tying this to packs is that you can piggy-back on the
.idx to avoid storing object ids a second time. If we imagine that you
use a 32-bit index into the .idx instead, that's a savings of 16 bytes
per object (or more when we switch to a longer hash). You only need to
refer to commits and their root trees, though. So on something like
linux.git, you're talking about 2 * 700k * 16 = 21 megabytes you could
save.

That may not be worth worrying about too much, compared to the size of
the rest of the data. Disk space is obviously cheap, but I'm more
concerned about working-set size. However, 21 megabytes probably isn't
breaking the bank there these days (and it may even be faster, since the
commit-graph lookups can use the more compact index that contains only
commits, not other objects).

The big advantage of your scheme is that you can update the graph index
without repacking. The traditional advice has been that you should
always do a full repack during a gc (since it gives the most delta
opportunities). So metadata like reachability bitmaps were happy to tied
to packs, since you're repacking anyway during a gc. But my
understanding is that this doesn't really fly with the Windows
repository, where it's simply so big that you never obtain a single
pack, and just pass around slices of history in pack format.

So I think I'm OK with the direction here of keeping metadata caches
separate from the pack storage.

> This sort of interaction with GC is one reason why I did not include the
> automatic updates in this patch. The integration with existing maintenance
> tasks will be worth discussion in its own right. I'd rather demonstrate the
> value of having a graph (even if it is currently maintained manually) and
> then follow up with a focus to integrate with repack, gc, etc.
> 
> I plan to clean up this patch on Monday given the feedback I received the
> last two days (Thanks Jonathan and Szeder!). However, if the current builtin
> design will block merging, then I'll wait until we can find one that works.

If they're not tied to packs, then I think having a separate builtin
like this is the best approach. It gives you a plumbing command to
experiment with, and it can easily be called from git-gc.

-Peff

Reply via email to