Control: tag -1 patch Hi!
On Mon, 2025-08-11 at 01:51:50 +0200, Guillem Jover wrote: > I was looking at this again just now, and I think the subsequent git > fetches are causing the problem. On my server the dpkg.git repo is > 180 MiB (and I've not run «git gc --aggressive» for a while. > > Trying to replicate what the vcswatch data gathering script is doing, > I got the following: > > ,--- > # Initial clone > $ git clone --quiet --bare --mirror --depth 50 --filter tree:0 \ > --no-single-branch --template '' \ > https://git.dpkg.org/git/dpkg/dpkg.git dpkg.git > warning: filtering not recognized by server, ignoring > $ cd dpkg.git > $ du -sh . > 57M . > # Iterative fetch 1 > $ git -c gc.auto=200 fetch --depth 50 --prune --force origin '*:*' > […] > $ du -sh > 115M . > # Iterative fetch 2 > $ git -c gc.auto=200 fetch --depth 50 --prune --force origin '*:*' > […] > $ du -sh > 173M . > # Iterative fetch 3 > $ git -c gc.auto=200 fetch --depth 50 --prune --force origin '*:*' > […] > $ du -sh > 231M . > `--- > > Which I guess increases until reaching the 500 MiB limit. I notice > that under objects/pack/ there is one set of similarly sized packs > (57 MiB) per each iteration. > > Running «git gc» on the repo makes things go back to a more normal > size, as it would be expected. Ok, I think the attached patch should help with this, as it will force an automatic «git gc» after 4 packs on disk. The fetching though still seems rather inefficient, because at least for dpkg, it will keep requesting to download a pack which is currently 57 MiB big, instead of asking for the few new objects that would usually be downloaded. Thanks, Guillem
From 2f3e9d591a2ea698932ab7af4bbabd5fed8e3dad Mon Sep 17 00:00:00 2001 From: Guillem Jover <guil...@debian.org> Date: Sat, 16 Aug 2025 17:57:50 +0200 Subject: [PATCH] =?UTF-8?q?vcswatch:=20Force=20a=20=C2=ABgit=20gc=C2=BB=20?= =?UTF-8?q?after=20fetch=20to=20avoid=20hitting=20repo=20quotas?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The current code requests a depth of 50 commits, and requests auto garbage collection after 200 loose objects are in the repository. The problem is that on each fetch we might get a pack with all the relevant object, which will be duplicated with the packs for the previous fetches, where there will be no loose objects. These will keep stacking and then we will hit the repository quota, and further fetching will be completely disabled. Instead, request that we do not want more than 4 non-keep packs, before triggering an auto garbage collection. Closes: #1107620 Ref: #1072498 --- data/vcswatch/vcswatch | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/data/vcswatch/vcswatch b/data/vcswatch/vcswatch index a31bf079..954e4cc7 100755 --- a/data/vcswatch/vcswatch +++ b/data/vcswatch/vcswatch @@ -296,7 +296,9 @@ sub process_package ($) { } runcmd ('darcs', 'pull', '-a'); } elsif ($pkg->{vcs} eq 'Git') { - runcmd ('git', '-c', 'gc.auto=200', + runcmd ('git', + '-c', 'gc.auto=200', + '-c', 'gc.autoPackLimit=4', 'fetch', ($pkg->{dumb_http} ? () : ('--depth', '50')), '--prune', '--force', 'origin', '*:*'); -- 2.50.1