Hi, Maxim Cournoyer <[email protected]> skribis:
> Fixes <https://issues.guix.gnu.org/43773>. > > The computed normalized load was previously obtained by dividing the load > average as found in /proc/loadavg by the number of parallel builds defined for > a build machine. > > This normalized didn't allow to compare machines with different number of ^ > cores, as the load average reported by can be as high as the number of cores; ^ Missing words. > thus comparing that value to a fixed threshold of 2.0 would mean machines with > multiple cores were more likely to be flagged as overloaded compared to single > core machines. > > This can be fixed by normalizing using the available number of cores instead > of the number of parallel jobs. Indeed, good catch! > * guix/scripts/offload.scm (<build-machine>)[overload-threshold]: New field. > (node-load): Modify to return a normalized load value between 0 and 1, taking > into account the number of cores available. > (normalized-load): Remove procedure. > (report-load): New procedure. > (choose-build-machine): Adjust to use the modified 'node-load' and the new > 'report-load' and 'build-machine-overload-threshold' procedures. > (check-machine-status): Adjust. > * doc/guix.texi (Daemon Offload Setup): Document the offload scheduler and the > new 'overload-threshold' field. > > doc/guix.texi | 30 +++++++++++++++++++++- > guix/scripts/offload.scm | 54 ++++++++++++++++++++++++---------------- > 2 files changed, 62 insertions(+), 22 deletions(-) Nice. [...] > (define (node-load node) > - "Return the load on NODE. Return + if NODE is misbehaving." > + "Return the load on NODE, a normalized value between 0.0 and 1.0. The > value > +is derived from /proc/loadavg and normalized according to the number of > +logical cores available, to give a rough estimation of CPU usage. Return > +1.0 (fully loaded) if NODE is misbehaving." > (let ((line (inferior-eval '(begin > (use-modules (ice-9 rdelim)) > (call-with-input-file "/proc/loadavg" > read-string)) > - node))) > - (if (eof-object? line) > - +inf.0 ;MACHINE does not respond, so assume it is infinitely loaded > + node)) > + (ncores (inferior-eval '(begin > + (use-modules (ice-9 threads)) > + (current-processor-count)) > + node))) > + (if (or (eof-object? line) (eof-object? ncores)) > + 1.0 ;MACHINE does not respond, so assume it is fully loaded Returning 1.0 now is akin to returning + before, meaning that the machine will never be picked up, is that right? What if one sets overload-threshold = 1.0, the machine would still be picked up, no? > + (if (and node > + (or (not threshold) (< load threshold)) I think we can assume that THRESHOLD is always a number, including possible +inf.0. Thanks, Ludo.
