On Sat, Sep 07, 2019 at 01:04:40AM -0400, Jeff King wrote:
> The commit-graph tool may read a lot of commits, but it only cares about
> parsing their metadata (parents, trees, etc) and doesn't ever show the
> messages to the user. And so it should not need save_commit_buffer,
> which is meant for holding onto the object data of parsed commits so
> that we can show them later. In fact, it's quite harmful to do so.
> According to massif, the max heap of "git commit-graph write
> --reachable" in linux.git before/after this patch (removing the commit
> graph file in between) goes from ~1.1GB to ~270MB.
> Which isn't surprising, since the difference is about the sum of the
> uncompressed sizes of all commits in the repository, and this was
> equivalent to leaking them.
> This obviously helps if you're under memory pressure, but even without
> it, things go faster. My before/after times for that command (without
> massif) went from 12.521s to 11.874s, a speedup of ~5%.
> Signed-off-by: Jeff King <p...@peff.net>
> ---
> We didn't actually notice this on linux.git, but rather on a repository
> with 130 million commits (don't ask). With this patch, I was able to
> generate the commit-graph file with a peak heap of ~25GB, which is ~200
> bytes per commit.
> I'll bet we could do better with some effort, but obviously this case
> was just pathological. For most cases this should be cheaper than a
> normal repack (which probably spends that much memory on each object,
> not just commits).
>  builtin/commit-graph.c | 2 ++
>  1 file changed, 2 insertions(+)
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 57863619b7..052696f1af 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -251,6 +251,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
> char *prefix)
>                            builtin_commit_graph_usage,
>                            PARSE_OPT_STOP_AT_NON_OPTION);
> +     save_commit_buffer = 0;
> +

This looks exactly right to me. We had discussed a little bit off-list
about where you might place this line, but I think that the spot you
picked is perfect: as late as possible.

Thankfully, the option parsing code here doesn't load any commits
(though even if it did, I don't think that turning on/off
'save_commit_buffer' would really make much of a difference).

So, the patch here looks obviously correct, and I don't think it needs a
test or anything like that... besides: what is there to test? :).

>       if (argc > 0) {
>               if (!strcmp(argv[0], "read"))
>                       return graph_read(argc, argv);
> --


Reply via email to