On Thu, Mar 01 2018, Nguyễn Thái Ngọc Duy jotted:

> pack-objects could be a big memory hog especially on large repos,
> everybody knows that. The suggestion to stick a .keep file on the
> largest pack to avoid this problem is also known for a long time.
>
> Let's do the suggestion automatically instead of waiting for people to
> come to Git mailing list and get the advice. When a certain condition
> is met, gc --auto create a .keep file temporary before repack is run,
> then remove it afterward.
>
> gc --auto does this based on an estimation of pack-objects memory
> usage and whether that fits in one third of system memory (the
> assumption here is for desktop environment where there are many other
> applications running).
>
> Since the estimation may be inaccurate and that 1/3 threshold is
> arbitrary, give the user a finer control over this mechanism as well:
> if the largest pack is larger than gc.bigPackThreshold, it's kept.

This is very promising. Saves lots of memory on my ad-hoc testing of
adding a *.keep file on an in-house repo.

> +     if (big_pack_threshold)
> +             return pack->pack_size >= big_pack_threshold;
> +
> +     /* First we have to scan through at least one pack */
> +     mem_want = pack->pack_size + pack->index_size;
> +     /* then pack-objects needs lots more for book keeping */
> +     mem_want += sizeof(struct object_entry) * nr_objects;
> +     /*
> +      * internal rev-list --all --objects takes up some memory too,
> +      * let's say half of it is for blobs
> +      */
> +     mem_want += sizeof(struct blob) * nr_objects / 2;
> +     /*
> +      * and the other half is for trees (commits and tags are
> +      * usually insignificant)
> +      */
> +     mem_want += sizeof(struct tree) * nr_objects / 2;
> +     /* and then obj_hash[], underestimated in fact */
> +     mem_want += sizeof(struct object *) * nr_objects;
> +     /*
> +      * read_sha1_file() (either at delta calculation phase, or
> +      * writing phase) also fills up the delta base cache
> +      */
> +     mem_want += delta_base_cache_limit;
> +     /* and of course pack-objects has its own delta cache */
> +     mem_want += max_delta_cache_size;

I'm not familiar enough with this part to say, but isn't this assuming a
lot about the distribution of objects in a way that will cause is not to
repack in some pathological cases?

Probably worth documenting...

> +     /* Only allow 1/3 of memory for pack-objects */
> +     mem_have = total_ram() / 3;

Would be great to have this be a configurable variable, so you could set
it to e.g. 33% (like here), 50% etc.

Reply via email to