Hi, Tomas Volf <[email protected]> skribis:
>> After spending hours on this and fixing improbable issues in the >> Shepherd (will push shortly), I found that the root of the problem is >> exactly what I feared and which led to the patches at >> <https://issues.guix.gnu.org/76262>. >> >> Namely, ‘dhcp-client-service-type’ calls ‘waitpid’; that call competes >> with the one done by shepherd’s SIGCHLD handler and, if you’re unlucky, >> it loses the race and waits forever. > > Observation here. While yes, based on the description I agree that it > is (bad) luck based, in practice it seems to be extremely reliable to > reproduce. Yes, I could reproduce it 100% with just ‘bare-bones.tmpl’. Thing is, as soon as you would change something non-trivial, for instance the ‘message-destination’ procedure of shepherd so that it writes everything to /dev/console, the problem would go away. Even just commenting out some of the parameters passed to ‘system-log’ could make the problem disappear (!), which is why it took me a lot of time to figure it out. >> Could you try your config with the patch at >> <https://issues.guix.gnu.org/76262#2>, at least in a VM and ideally on >> the metal? [...] > I can confirm the patch 2 fixes the issue for me, both in the VM and on > physical machine. Yay! > Only thing I have noticed that even when deploying the "good" commit, I > see the following error in the log: > > guix deploy: warning: an error occurred while upgrading services on > '127.0.0.1': > %exception #<inferior-object #<&service-not-found-error service: system-log>> I think I understood this one now. The old service has only one name: syslogd. The new one, which upgrades it, has two names: system-log and syslogd (system-log is its “canonical name”). The service upgrade machinery gets confused because it uses the canonical name in one place. I’ll investigate. Ludo’.
