Package: chrony Version: 4.2-2 Severity: important Tags: patch X-Debbugs-Cc: mcg...@kernel.org
Dear Maintainer, When using the new kdevops [0] reboot-limit [1] test to see how may reboots can happen with debian-testing without a failure I ran have ran 3 tests with different kernels with the following observations. The point of the test is to simply instantiate vagrant debian-testing guests, and then reboot them and detect with ansible if ssh access to the guest is possible. The test fails upon an ssh timeout or crash. In the list below a + indicates the test is still running. A single digit expresses how many times reboots completed successfully. kernel | reboots | with-fix ------------------------------------------------------------------------- v5.10.105 | 500 | not-tested-yet v5.17-rc7 | 1,200 | 2,000+ 5.17.0-1-amd64 | 3,300+ | first-run-stil-running Upon inspection on the failed boots on v5.10.105 and v5.17-rc7 I noticed the following on both systems: root@rebootlimit ~ # sudo systemctl list-units --failed UNIT LOAD ACTIVE SUB DESCRIPTION ● ifup@eth0.service loaded failed failed ifup for eth0 I can see then (scraped from a console, sorry about formatting): ]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3 ]: DHCPOFFER of 192.168.121.240 from 192.168.121.1 ]: DHCPREQUEST for 192.168.121.240 on eth0 to 255.255.255.255 port 67 ]: DHCPACK of 192.168.121.240 from 192.168.121.1 ]: bound to 192.168.121.240 -- renewal in 1699 seconds. nd to 192.168.121.240 -- renewal in 1699 seconds. -parts: /etc/network/if-up.d/chrony exited with return code 1 p: failed to bring up eth0 ifup@eth0.service: Main process exited, code=exited, status=1/FAILURE ifup@eth0.service: Failed with result 'exit-code'. The important line is: May 21 10:58:58 rebootlimit sh[693]: run-parts: /etc/network/if-up.d/chrony exixited with return code 1 Using $(virsh net-dhcp-leases vagrant-libvirt) I see no takers of the IP address and so there has not been clashes. So my next best guesss given the lack of output from chrony is that this is a race on bootup. I'm still testing things but the following adjustment seems to have helped so far. --- /etc/network/if-up.d/chrony.old 2022-05-24 16:40:53.112439882 +0000 +++ /etc/network/if-up.d/chrony 2022-05-24 16:41:23.452471796 +0000 @@ -5,6 +5,7 @@ [ -x /usr/sbin/chronyd ] || exit 0 if [ -e /run/chrony/chronyd.pid ]; then + systemctl is-system-running --wait chronyc onoffline > /dev/null 2>&1 fi [0] https://github.com/linux-kdevops/kdevops [1] https://github.com/linux-kdevops/kdevops/blob/master/workflows/demos/reboot-limit/Kconfig -- System Information: Debian Release: bookworm/sid APT prefers testing APT policy: (500, 'testing') Architecture: amd64 (x86_64) Kernel: Linux 5.10.105 (SMP w/8 CPU threads) Kernel taint flags: TAINT_UNSIGNED_MODULE Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set Shell: /bin/sh linked to /usr/bin/dash Init: systemd (via /run/systemd/system) LSM: AppArmor: enabled Versions of packages chrony depends on: ii adduser 3.121 ii init-system-helpers 1.62 ii iproute2 5.17.0-2 ii libc6 2.33-7 ii libcap2 1:2.44-1 ii libedit2 3.1-20210910-1 ii libgnutls30 3.7.4-2 ii libnettle8 3.7.3-1 ii libseccomp2 2.5.4-1 ii tzdata 2022a-1 ii ucf 3.0043 chrony recommends no packages. Versions of packages chrony suggests: ii bind9-dnsutils [dnsutils] 1:9.18.1-1 pn networkd-dispatcher <none> -- Configuration Files: /etc/network/if-up.d/chrony changed: set -e [ -x /usr/sbin/chronyd ] || exit 0 if [ -e /run/chrony/chronyd.pid ]; then systemctl is-system-running --wait chronyc onoffline > /dev/null 2>&1 fi exit 0 -- no debconf information
--- /etc/network/if-up.d/chrony.old 2022-05-24 16:40:53.112439882 +0000 +++ /etc/network/if-up.d/chrony 2022-05-24 16:41:23.452471796 +0000 @@ -5,6 +5,7 @@ [ -x /usr/sbin/chronyd ] || exit 0 if [ -e /run/chrony/chronyd.pid ]; then + systemctl is-system-running --wait chronyc onoffline > /dev/null 2>&1 fi