On Wed, Aug 14, 2013 at 07:04:37PM +0200, Stefan Beller wrote:
> But apart from my blabbering, I think ivegy made a good point:
> The C parts just don't rely on external things, but only libc and
> kernel, so it may be nicer than a shell script. Also as it is used
> serversided, the performance aspect is not negligible.
> I included Jeff King, who maybe could elaborate on git-repack on the
I don't think the performance of repack as a C program versus a shell
script is really relevant to us at GitHub. Sure, we run a fair number of
repacks, but the cost is totally dominated by the pack-objects process
You might be able to achieve some speedups if it was not simply a
shell->C conversion, but an overall gc rewrite that did more in a single
process, and reused results (for example, you can reuse all or part of
the history traversal from pack-object's "counting objects" phase to do
the reachability analysis during prune).
But I'd be very wary of stuffing too many things in a single process.
There are parts of the code that make assumptions about which objects
have been seen in the global object hash table (I believe index-pack is
one of these; see check_objects). And there are parts of the code which
must run separately (e.g., the connectivity check after transfer runs in
a separate process, both because it may die(), but also because we want
a clean slate of which packs are available, with no caching of results
we may have seen).
None of those problems is unsolvable, but it's very hard to know when
one is going to pop up and bite you. And because the repacking and
pruning code is the most likely place for a bug to cause data loss, it
makes me a bit nervous.
 Another way to reuse the history traversal is to generate the
much-discussed pack reachability bitmaps, and then use them in
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html