I have a Guix (Linux) system on which the Shepherd service manager is
entirely nonresponsive. Attempts to use the `herd` command hang
indefinitely, and inetd services (like SSH) do not work. The system as a
whole /is/ responsive though; the running services continue to operate
normally, and Guix is fully usable. PID 1 does NOT have high CPU usage,
and is processing syslogs properly (messages go to the right files and
terminals). On console, I am able to switch TTYs (though one does not
have a shell on it because Shepherd isn't respawning the tty service for
it) and view the syslog on C-M-F12.
My attempts to debug PID 1 have yielded nothing that I can make sense
of. GDB is unable to get a Scheme stacktrace out of any running threads,
and strace is less than intelligible due to the running services writing
to their logs.
This system is still online, with Shepherd is its hung state, and will
likely be for as long as the services continue to run. Are there any
debugging steps I can take here that might provide more
information/recover Shepherd?
OpenPGP_0xBEFB74D5F3FC4387.asc
Description: OpenPGP public key
OpenPGP_signature.asc
Description: OpenPGP digital signature