Jeff King <p...@peff.net> wrote:
>On Sat, Jan 26, 2013 at 10:32:42PM -0800, Junio C Hamano wrote:
>> Both makes sense to me.
>> I also wonder if we would be helped by another "repack" mode that
>> coalesces small packs into a single one with minimum overhead, and
>> run that often from "gc --auto", so that we do not end up having to
>> have 50 packfiles.
>> When we have 2 or more small and young packs, we could:
>> - iterate over idx files for these packs to enumerate the objects
>> to be packed, replacing read_object_list_from_stdin() step;
>> - always choose to copy the data we have in these existing packs,
>> instead of doing a full prepare_pack(); and
>> - use the order the objects appear in the original packs, bypassing
>I'm not sure. If I understand you correctly, it would basically just be
>concatenating packs without trying to do delta compression between the
>objects which are ending up in the same pack. So it would save us from
>having to do (up to) 50 binary searches to find an object in a pack,
>would not actually save us much space.
>I would be interested to see the timing on how quick it is compared to
>real repack, as the I/O that happens during a repack is non-trivial
>(although if you are leaving aside the big "main" pack, then it is
>probably not bad).
>But how do these somewhat mediocre concatenated packs get turned into
>real packs? Pack-objects does not consider deltas between objects in
>same pack. And when would you decide to make a real pack? How do you
>know you have 50 young and small packs, and not 50 mediocre coalesced
If we are reconsidering repacking strategies, I would like to propose an
approach that might be a more general improvement to repacking which would help
in more situations.
You could roll together any packs which are close in size, say within 50% of
each other. With this strategy you will end up with files which are spread out
by size exponentially. I implementated this strategy on top of the current gc
script using keep files, it works fairly well:
This saves some time, but mostly it saves I/O when repacking regularly. I
suspect that if this strategy were used in core git that further optimizations
could be made to also reduce the repack time, but I don't know enough about
repacking to know? We run it nightly on our servers, both write and read only
mirrors. We us are a ratio of 5 currently to drastically reduce large repack
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html