Hello Ludovic, Ludovic Courtès <[email protected]> writes:
> Hi! > > zimoun <[email protected]> skribis: > >> I am just hitting this old bug#24496 [1]. >> >> On Mon, 26 Sep 2016 at 18:20, [email protected] (Ludovic Courtès) wrote: >>> ng0 <[email protected]> skribis: >>> >>>> When I forgot that my build machine is offline and I did not pass >>>> --no-build-hook, the offloading keeps trying forever until I had to >>>> cancel the build, boot the build-machine and started the build again. >> >> [...] >> >>> Like you say, on Hydra-style setup this could be a problem: the >>> front-end machine may have --max-jobs=0, meaning that it cannot perform >>> builds on its own. >>> >>> So I guess we would need a command-line option to select a different >>> behavior. I’m not sure how to do that because ‘guix offload’ is >>> “hidden” behind ‘guix-daemon’, so there’s no obvious place for such an >>> option. >> >> When the build machine used to offload is offline and the master daemon >> is --max-jobs=0, I expect X tries (leading to timeout) and then just >> fails with a hint, where X is defined by user. WDYT? >> >> >>> In the meantime, you could also hack up your machines.scm: it would >>> return a list where unreachable machines have been filtered out. >> >> Maybe, this could be done by “guix offload”. > > Prior to commit efbf5fdd01817ea75de369e3dd2761a85f8f7dd5, this was the > case: an unreachable machine would have ‘machine-load’ return +inf.0, > and so it would be discarded from the list of candidates. > > However, I think this behavior was unintentionally lost in > efbf5fdd01817ea75de369e3dd2761a85f8f7dd5. Maxim, WDYT? I just reviewed this commit, and don't see anywhere where the behavior would have changed. The discarding happens here: --8<---------------cut here---------------start------------->8--- - (if (and node (< load 2.) (>= space %minimum-disk-space)) + (if (and node + (or (not threshold) (< load threshold)) + (>= space %minimum-disk-space)) --8<---------------cut here---------------end--------------->8--- previously load could be set to +inf.0. Now it is a float between 0.0 and 1.0, with threshold defaulting to 0.6. As far as I remember, this has always been a problem for me (busy offload machines being forever retried with no fallback to the local machine). Thanks, Maxim
