Giovanni Biscuolo <g...@xelera.eu> writes: > Hi, > > following a recent discussion on guix-sysadmin I have to confirm the > ssh-daemon issue since it is still happening on some of the machines I > administer > > Previous possibly related bug reports are > https://issues.guix.gnu.org/issue/30993 and > https://issues.guix.gnu.org/issue/32197 > > Unfortunately this issue is *not* well reproducible, it depends on some > mysterious (to me) timing factor; AFAIU it does *not* depend on the > shepherd version, probably it depends on "something" related to IPv6 > (read below the details)
Hello, thank you for this report, it's reproducible with my box that has an old hard disk, and disable IPv6 for sshd does fix the issue for me... > > Andreas Enge <andr...@enge.fr> writes: > > [...] > >> My impression is that the problem is still there. I am quite certain it >> happened when I rebooted dover, since I had to connect on the serial console >> to manually restart the ssh service. > > I'm sure it happened when milano-guix-1 was rebooted due to data centre > maintenance and happened yesterday to one of my personal Guix machines at > office > > [...] > > My situation is similar to the one observed by Andreas > >> Well, it is in /var/log/messages: >> Aug 3 21:11:38 localhost sshd: Server listening on 0.0.0.0 port 22. >> Aug 3 21:11:55 localhost shepherd: Service ssh-daemon could not be >> started. > > [...] > Sep 4 21:46:02 localhost shepherd: Service syslogd has been started. > [...] > Sep 4 21:46:03 localhost shepherd: Service loopback has been started. > [...] > Sep 4 21:46:22 localhost vmunix: [ 0.226337] PCI: Using configuration > type 1 for base access > Sep 4 21:46:09 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to > 255.255.255.255 port 67 > [...] > Sep 4 21:46:24 localhost shepherd: Service networking has been started. > [...] > Sep 4 21:46:12 localhost sshd: Server listening on 0.0.0.0 port 22. > [...] > Sep 4 21:46:30 localhost vmunix: [ 0.250107] ACPI: PCI Interrupt Link > [LNKA] (IRQs 3 4 5 6 10 *11 12 14 15) > Sep 4 21:46:13 localhost dhclient: DHCPREQUEST for 10.38.2.16 on eno1 to > 255.255.255.255 port 67 > [...] > Sep 4 21:46:16 localhost dhclient: DHCPACK of 10.38.2.16 from 10.38.2.1 > [...] > Sep 4 21:46:33 localhost shepherd: Service ssh-daemon could not be > started. > [...] > Sep 4 21:46:47 localhost vmunix: [ 0.731142] Segment Routing with IPv6 > > > Please note the timing of the dhclient and the sshd processes: I > inserted them as printed in /var/log/messages but they are not > time-sequential: does it means something or is irrelevant? > > So the sshd process started (as far as I cen see there is no trace it > was stopped) and pretty soon shepherd noticed ssh-daemon was not > started. > > Logging in from the console I see the ssh-daemon is stopped but enabled: > > Status of ssh-daemon: > It is stopped. > It is enabled. > Provides (ssh-daemon). > Requires (syslogd loopback). > Conflicts with (). > Will be respawned. > > > [...] Yes, I think when 'ssh-daemon' failed to start, shepherd should respawn it until success or disable it, but by look at the code of 'make-forkexec-constructor', when using 'pid-file' (as 'ssh-ademon' does), and a timeout (default to 5s %pid-file-timeout) is reached, the processes got a 'SIGTERM' and return '#f' as its running state, which won't be respawn (it's not a pid number) I guess... To ludo: Is my analysis correct? It's not clear to me how to fix it so 'ssh-daemon' can be respawn though... > > If I start it via `sudo herd start ssh-daemon` it immediatly starts, > like in Andreas experience: > >> Aug 3 21:13:10 localhost sshd: Server listening on 0.0.0.0 port 22. >> Aug 3 21:13:10 localhost sshd: Server listening on :: port 22. >> Aug 3 21:13:11 localhost shepherd: Service ssh-daemon has been started. > > Sep 5 13:38:55 localhost sshd: Server listening on 0.0.0.0 port 22. > Sep 5 13:38:55 localhost sshd: Server listening on :: port 22. > Sep 5 13:38:55 localhost shepherd: Service ssh-daemon has been started. > > > Please notice the difference from above: this time the sshd server is > also listening on the IPv6 address :: while in the above log it was only > listening on the 0.0.0.0 IPv4 address > > Does the failure have something to do with IPv6 not available when sshd > starts for the first time after a reboot? I agree, as adding '(extra-content "ListenAddress 0.0.0.0")' to my 'openssh-configuration' to skip the ipv6 listen fix this issue for me. A proper fix should be respawn 'ssh-daemon' and start it after 'ipv6 available' (i don't know what this mean yet..).