Package: chrony
Version: 4.2-2
Severity: important
Tags: patch
X-Debbugs-Cc: mcg...@kernel.org

Dear Maintainer,

When using the new kdevops [0] reboot-limit [1] test to see how may reboots
can happen with debian-testing without a failure I ran have ran 3 tests
with different kernels with the following observations. The point of the
test is to simply instantiate vagrant debian-testing guests, and then
reboot them and detect with ansible if ssh access to the guest is
possible. The test fails upon an ssh timeout or crash. In the list below
a + indicates the test is still running. A single digit expresses how many
times reboots completed successfully.

kernel         | reboots     | with-fix
-------------------------------------------------------------------------       
                                                                                
                           
v5.10.105      | 500         | not-tested-yet
v5.17-rc7      | 1,200       | 2,000+
5.17.0-1-amd64 | 3,300+      | first-run-stil-running

Upon inspection on the failed boots on v5.10.105 and v5.17-rc7 I
noticed the following on both systems:

root@rebootlimit ~ # sudo systemctl list-units --failed
  UNIT              LOAD   ACTIVE SUB    DESCRIPTION                            
                                                                                
                              
    ● ifup@eth0.service loaded failed failed ifup for eth0 

I can see then (scraped from a console, sorry about formatting):

]: DHCPDISCOVER on eth0 to 255.255.255.255 port 67 interval 3                   
                                                                                
                              
]: DHCPOFFER of 192.168.121.240 from 192.168.121.1                              
                                                                                
                              
]: DHCPREQUEST for 192.168.121.240 on eth0 to 255.255.255.255 port 67           
                                                                                
                              
]: DHCPACK of 192.168.121.240 from 192.168.121.1                                
                                                                                
                              
]: bound to 192.168.121.240 -- renewal in 1699 seconds.                         
                                                                                
                              
nd to 192.168.121.240 -- renewal in 1699 seconds.                               
                                                                                
                              
-parts: /etc/network/if-up.d/chrony exited with return code 1                   
                                                                                
                              
p: failed to bring up eth0                                                      
                                                                                
                              
ifup@eth0.service: Main process exited, code=exited, status=1/FAILURE           
                                                                                
                              
ifup@eth0.service: Failed with result 'exit-code'.

The important line is:

May 21 10:58:58 rebootlimit sh[693]: run-parts: /etc/network/if-up.d/chrony 
exixited with return code 1

Using $(virsh net-dhcp-leases vagrant-libvirt) I see no takers of the IP
address and so there has not been clashes. So my next best guesss given
the lack of output from chrony is that this is a race on bootup.

I'm still testing things but the following adjustment seems to have
helped so far.

--- /etc/network/if-up.d/chrony.old     2022-05-24 16:40:53.112439882 +0000
+++ /etc/network/if-up.d/chrony 2022-05-24 16:41:23.452471796 +0000
@@ -5,6 +5,7 @@
 [ -x /usr/sbin/chronyd ] || exit 0
 
 if [ -e /run/chrony/chronyd.pid ]; then
+    systemctl is-system-running --wait
     chronyc onoffline > /dev/null 2>&1
 fi
 

[0] https://github.com/linux-kdevops/kdevops
[1] 
https://github.com/linux-kdevops/kdevops/blob/master/workflows/demos/reboot-limit/Kconfig

-- System Information:
Debian Release: bookworm/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)

Kernel: Linux 5.10.105 (SMP w/8 CPU threads)
Kernel taint flags: TAINT_UNSIGNED_MODULE
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages chrony depends on:
ii  adduser              3.121
ii  init-system-helpers  1.62
ii  iproute2             5.17.0-2
ii  libc6                2.33-7
ii  libcap2              1:2.44-1
ii  libedit2             3.1-20210910-1
ii  libgnutls30          3.7.4-2
ii  libnettle8           3.7.3-1
ii  libseccomp2          2.5.4-1
ii  tzdata               2022a-1
ii  ucf                  3.0043

chrony recommends no packages.

Versions of packages chrony suggests:
ii  bind9-dnsutils [dnsutils]  1:9.18.1-1
pn  networkd-dispatcher        <none>

-- Configuration Files:
/etc/network/if-up.d/chrony changed:
set -e
[ -x /usr/sbin/chronyd ] || exit 0
if [ -e /run/chrony/chronyd.pid ]; then
    systemctl is-system-running --wait
    chronyc onoffline > /dev/null 2>&1
fi
exit 0


-- no debconf information
--- /etc/network/if-up.d/chrony.old     2022-05-24 16:40:53.112439882 +0000
+++ /etc/network/if-up.d/chrony 2022-05-24 16:41:23.452471796 +0000
@@ -5,6 +5,7 @@
 [ -x /usr/sbin/chronyd ] || exit 0
 
 if [ -e /run/chrony/chronyd.pid ]; then
+    systemctl is-system-running --wait
     chronyc onoffline > /dev/null 2>&1
 fi
 

Reply via email to