On Fri, Oct 05, 2018 at 09:45:47AM -0400, Derrick Stolee wrote:

> My misunderstanding was that your proposed change to gc computes the
> commit-graph in either of these two cases:
> 
> (1) The auto-GC threshold is met.
> 
> (2) There is no commit-graph file.
> 
> And what I hope to have instead of (2) is (3):
> 
> (3) The commit-graph file is "sufficiently behind" the tip refs.
> 
> This condition is intentionally vague at the moment. It could be that we
> hint that (3) holds by saying "--post-fetch" (i.e. "We just downloaded a
> pack, and it probably contains a lot of new commits") or we could create
> some more complicated condition based on counting reachable commits with
> infinite generation number (the number of commits not in the commit-graph
> file).
> 
> I like that you are moving forward to make the commit-graph be written more
> frequently, but I'm trying to push us in a direction of writing it even more
> often than your proposed strategy. We should avoid creating too many
> orthogonal conditions that trigger the commit-graph write, which is why I'm
> pushing on your design here.
> 
> Anyone else have thoughts on this direction?

Yes, I think measuring "sufficiently behind" is the right thing.
Everything else is a proxy or heuristic, and will run into corner cases.
E.g., I have some small number of objects and then do a huge fetch, and
now my commit-graph only covers 5% of what's available.

We know how many objects are in the graph already. And it's not too
expensive to get the number of objects in the repository. We can do the
same sampling for loose objects that "gc --auto" does, and counting
packed objects just involves opening up the .idx files (that can be slow
if you have a ton of packs, but you'd want to either repack or use a
.midx in that case anyway, either of which would help here).

So can we really just take (total_objects - commit_graph_objects) and
compare it to some threshold?

-Peff

Reply via email to