On Thu, 16 Feb 2023 09:24:08 -0500
Rich Freeman <ri...@gentoo.org> wrote:

> On Thu, Feb 16, 2023 at 8:39 AM Peter Humphrey <pe...@prh.myzen.co.uk> wrote:
> >
> > I've just looked at 'man make', from which it's clear that -j = --jobs, and
> > that both those and --load-average are passed to /usr/bin/make, presumably
> > untouched unless portage itself has identically named variables. So I wonder
> > how feasible it might be for make to incorporate its own checks to ensure 
> > that
> > the load average is not exceeded. I am not a programmer (not for at least 35
> > years, anyway), so I have to leave any such suggestion to the experts.
> >
>
> Well, if we just want to have a fun discussion here are my thoughts.
> However, the complexity vs usefulness outside of Gentoo is such that I
> don't see it happening.
>
> For the most typical use case - a developer building the same thing
> over and over (which isn't Gentoo), then make could cache info on
> resources consumed, and use that to make more educated decisions about
> how many tasks to launch.  That wouldn't help us at all, but it would
> help the typical make user.  However, the typical make user can just
> tune things in other ways.
>
> It isn't going to be possible for make to estimate build complexity in
> any practical way.  Halting problem aside maybe you could build in
> some smarts looking at the program being executed and its arguments,
> but it would be a big mess.
>
> Something make could do is tune the damping a bit.  It could gradually
> increase the number of jobs it runs and watch the load average, and
> gradually scale it up appropriately, and gradually scale down if CPU
> is the issue, or rapidly scale down if swap is the issue.  If swapping
> is detected it could even suspend most of the tasks it has spawned and
> then gradually continue them as other tasks finish to recover from
> this condition.  However, this isn't going to work as well if portage
> is itself spawning parallel instances of make - they'd have to talk to
> each other or portage would somehow need to supervise things.
>
> A way of thinking about it is that when you have portage spawning
> multiple instances of make, that is a bit like adding gain to the
> --load-average MAKEOPTS.  So each instance of make independently looks
> at load average and takes action.  So you have an output (compilers
> that create load), then you sample that load with a time-weighted
> average, and then you apply gain to this average, and then use that as
> feedback.  That's basically a recipe for out of control oscillation.
> You need to add damping and get rid of the gain.
>
> Disclaimer: I'm not an engineer and I suspect a real engineer would be
> able to add a bit more insight.
>
> Really though the issue is that this is the sort of thing that only
> impacts Gentoo and so nobody else is likely to solve this problem for
> us.
>

Given all your explenation and my annoyance a couple of years ago, I
hacked a little helper that sits between make and spawned build jobs.
Basically what annoyed me is the fact that chromium would compile for
hours and then fail, because it would need more memory than memory
available, and this would fail the whole build.
One possible solution is to reduce the number of build jobs to e.g. -j1
for chromium, but this is stupid because 99% of the time -j16 would
work just fine.

So I hacked a bit around, and came up with little helper&watcher. The
helper would limit spawning new jobs to SOME_LIMIT, and when load
is too high (e.g.g I am doing other work on the PC, that's not
under emerge's control). The watcher kills memory hungry build jobs,
once memory usage higher than 90%, tells the helper to stop spawning new
jobs, waits until the helper reports that no more build jobs are
running and then respawns the memory hungry build job (i.e. the memory
hungry build job will run essentially as if -j1 was specified)

This way I can mix emerge --jobs=HIGH_NUMBER and make
-jOTHER_HIGH_NUMBER, and it wouldn't affect the system, because the
total number of actual build jobs is controlled by the helper, and would
never go beyond SOME_LIMIT, even if HIGH_NUMBER*OTHER_HIGH_NUMBER > SOME_LIMIT.

I never published this anywhere, but if there's interest in it, I can
probably upload it somewhere, but I had the feeling that it's quite
hacky and not worth publishing. Also I was never sure if I break emerge
in some way, because it's very low-level, but now it's running since
more than a year without any emerge failure due to this hijacking.

Reply via email to