On Wed, Aug 7, 2013 at 7:10 AM, Martin Fick <mf...@codeaurora.org> wrote:
>> I wonder if a simpler approach may be nearly efficient as
>> this one: keep the largest pack out, repack the rest at
>> fetch/push time so there are at most 2 packs at a time.
>> Or we we could do the repack at 'gc --auto' time, but
>> with lower pack threshold (about 10 or so). When the
>> second pack is as big as, say half the size of the
>> first, merge them into one at "gc --auto" time. This can
>> be easily implemented in git-repack.sh.
> It would definitely be better than the current gc approach.
> However, I suspect it is still at least one to two orders of
> magnitude off from where it should be. To give you a real
> world example, on our server today when gitexproll ran on
> our kernel/msm repo, it consolidated 317 pack files into one
> almost 8M packfile (it compresses/dedupes shockingly well,
> one of those new packs was 33M). Our largest packfile in
> that repo is 1.5G!
> So let's now imagine that the second closest packfile is
> only 100M, it would keep getting consolidated with 8M worth
> of data every day (assuming the same conditions and no extra
> compression). That would take (750M-100M)/8M ~ 81 days to
> finally build up large enough to no longer consolidate the
> new packs with the second largest pack file daily. During
> those 80+ days, it will be on average writing 325M too much
> per day (when it should be writing just 8M).
> So I can see the appeal of a simple solution, unfortunately
> I think one layer would still "suck" though. And if you are
> going to add even just one extra layer, I suspect that you
> might as well go the full distance since you probably
> already need to implement the logic to do so?
I see. It looks like your way is the best way to go.
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html