Ludovic Courtès writes: Hello!
> Janneke Nieuwenhuizen <[email protected]> skribis: > >>>From ad94f06620e53fcc1495a2e2479dfc627177047c Mon Sep 17 00:00:00 2001 >> Message-ID: >> <ad94f06620e53fcc1495a2e2479dfc627177047c.1692783678.git.jann...@gnu.org> >> From: Janneke Nieuwenhuizen <[email protected]> >> Date: Thu, 22 Jun 2023 08:30:25 +0200 >> Subject: [PATCH v4] self: Build directories in chunks of max 25 files at a >> time. >> >> Similar to split build of make-go in Makefile.am, this breaks-up building >> directories into chunks of max 25 files. Also force garbage collection. > > The big difference with ‘make-go’ is that ‘make-go’ spawns a new process > for each chunk of files: each process starts with an empty heap, which > is not the case here as we reuse the same process. Right. > However, (guix self) is already splitting gnu/packages/*.scm in two > pieces: ‘guix-packages-base’ and ‘guix-packages’. The former is the > closure of (gnu packages base), and the latter contains the remaining > files. Unfortunately this is uneven: Okay... > $ readlink -f $(type -P guix) > /gnu/store/12p5axbr4gjrghlrqa4ikmhsxwq2wgw3-guix-command > $ guix gc -R /gnu/store/12p5axbr4gjrghlrqa4ikmhsxwq2wgw3-guix-command|grep > packages-base > /gnu/store/ivprgy9b2lv8wmkm10wkypf7k24cdifb-guix-packages-base > /gnu/store/05pjlcfcfa0k9y833nnxxxjcn5mqr8zj-guix-packages-base-source > /gnu/store/gnxjbyfwfmb216krz2x0cf1z5k1lla9x-guix-packages-base-modules > $ find /gnu/store/ivprgy9b2lv8wmkm10wkypf7k24cdifb-guix-packages-base -type > f |wc -l > 361 > $ guix gc -R /gnu/store/12p5axbr4gjrghlrqa4ikmhsxwq2wgw3-guix-command|grep > packages$ > /gnu/store/8cda50hsayydrlw0qrhcy8q4dr9f1avx-guix-locale-guix-packages > ludo@ribbon ~/src/guix [env]$ find > /gnu/store/8cda50hsayydrlw0qrhcy8q4dr9f1avx-guix-locale-guix-packages | wc -l > 64 > $ guix describe > Generation 271 Aug 20 2023 23:48:59 (current) > guix a0f5885 > repository URL: https://git.savannah.gnu.org/git/guix.git > branch: master > commit: a0f5885fefd93a3859b6e4b82b18a6db9faeee05 > > Maxime Devos looked into this a while back: > > https://issues.guix.gnu.org/54539 Oh my.... >> * guix/self.scm (compiled-modules)[process-directory]: Split building of >> directories into chunks of max 25 files. >> + (for-each >> + (lambda (chunck) > > s/chunck/chunk/ Oops, fixed. > Can you confirm that this reduces memory usage observably? One way to > check that would be to print (gc-stats) from ‘process-directory’, with > and without the change. Could you give it a try? What a good and seemingly simple question. After a week of instrumentation and testing, my answer can only be: I tried, and maybe. (see below). > Intuitively, I don’t see why it would eat less memory; maybe peak memory > usage is lower because we do less at once? Okay... > Also, I think we should remove the explicit (gc) call: it should not be > necessary, and if we depend on that, something’s wrong. > Anyhow, thanks for tackling this issue! Hehe. You've probably seen Josselin's recent GraphML backend effort that might really help to address this? I'm afraid this patch can maybe only postpone what really needs to be done... There is gc-stats output from a successful `guix pull' or `make as-derivation' on Guix/Hurd, that I can show you, and I've tried more than 20 times; it always fails (OOM, hang, spontaneous reset, ...). Below is a typical output of gc-stats on the Hurd for building self.scm, when heap-size peaks (using the the max 25 files patch): --8<---------------cut here---------------start------------->8--- ((gc-time-taken . 1530) (heap-size . 2,625,474,560) (heap-free-size . 1127989248) (heap-total-allocated . 1337029496) (heap-allocated-since-gc . 28728) (protected-objects . 28) (gc-times . 324)) --8<---------------cut here---------------end--------------->8--- notice that it's *much* bigger (more than twice) than my findings on linux-64 below. I have no idea why this is of what it might mean... So I turned to Guix GNU/Linux to get some gc-stat measurements. What you see below is the maximum head-size at any point (I also have heap-total-allocated but I think that's irrelevant? and initially didn't use a script that measured the time). --8<---------------cut here---------------start------------->8--- * guix/self.scm: Vanilla, not chunked; print gc-stats. ((gc-time-taken . 27319485051) (heap-size . 1,360,330,752) (heap-free-size . 285,696,000) (heap-total-allocated . 74,067,590,944) (heap-allocated-since-gc . 186,250,144) (protected-objects . 28) (gc-times . 464)) real 24m36.643s * guix/self.scm: Split building of directories into 26 chunks; print gc-stats. (heap-size . 1,131,298,816) * guix/self.scm: Split building of directories into 26 chunks; no gc; print gc-stats. (heap-size . 1,121,116,160) * guix/self.scm: Chunks of 25 files; run gc; print gc-stats. (heap-size . 1,066,725,376) * guix/self.scm: Chunks of 50 files; no gc; print gc-stats. (heap-size . 1,299,230,720) real 26m40.708s * guix/self.scm: Chunks of 25 files; no gc; print gc-stats. (heap-size . 1,024,045,056) ; 1st run real 28m4.451s * guix/self.scm: Chunks of 10 files; no gc; print gc-stats. (heap-size . 1,077,895,168) real 30m14.049s --8<---------------cut here---------------end--------------->8--- ...strangely enough, if we assume that these statistics translate to the Hurd, using chunks of max 25 files seems to be a sort of sweet spot? 25% less peak memory (~300MB), "only" 12% (3"45') slower... though not great for GNU/Linux users... I have produced a handful of successful `guix pull's (from a local checked-out worktree) using the 26-way split and chunks of max-25 files patches, but sadly also many more attempts failed. Initially, when creating this patch series, I was convinced this fixed building on the Hurd, but I'm much less enthusiastic now. So I still have a slight preference for using the latest max-25-files patch, but I'm sorry to say that I cannot back it up with tangible data. All in all a rather discouraging week with much effort spent for little gain. Hopefully Josselin can do some of his magic here :) Greetings, Janneke -- Janneke Nieuwenhuizen <[email protected]> | GNU LilyPond https://LilyPond.org Freelance IT https://www.JoyOfSource.com | Avatar® https://AvatarAcademy.com
