Package: src:linux
Version: 3.16.36-1+deb8u1
Severity: normal

Hi,

When booting, on about 5% of boots, the system hangs for several minutes
while waiting for systemd-udev-settle to complete. (systemd-udev-settle
is triggered by lvm2)

The log shows:
Oct 07 11:40:59 grisou-6.nancy.grid5000.fr systemd-udevd[461]: worker [517] 
/devices/system/cpu/cpu13 timeout; kill it
Oct 07 11:40:59 grisou-6.nancy.grid5000.fr systemd-udevd[461]: seq 3533 
'/devices/system/cpu/cpu13' killed
Oct 07 11:40:59 grisou-6.nancy.grid5000.fr systemd-udevd[461]: worker [517] 
terminated by signal 9 (Killed)

And systemd-udev-settle is seen as Failed as it reached the timeout:
# systemctl status systemd-udev-settle.service
‚óŹ systemd-udev-settle.service - udev Wait for Complete Device Initialization
   Loaded: loaded (/lib/systemd/system/systemd-udev-settle.service; static)
   Active: failed (Result: timeout) since Thu 2016-10-06 12:46:39 CEST; 1min 
57s ago
     Docs: man:udev(7)
           man:systemd-udevd.service(8)
  Process: 456 ExecStart=/bin/udevadm settle (code=killed, signal=TERM)
 Main PID: 456 (code=killed, signal=TERM)


It happens on various machines, of various models (all Dell, but I'm not sure
this is relevant as all our recent machines are Dell machines). A hardware
issue is unlikely.

It is fixed in stretch and unstable.

I bisected it, and found that commit 6f942a1f264e875c5f3ad6f505d7b500a3e7fa82
fixed it. That commit is:

commit 6f942a1f264e875c5f3ad6f505d7b500a3e7fa82
Author: Peter Zijlstra <pet...@infradead.org>
Date:   Wed Sep 24 10:18:46 2014 +0200

    locking/mutex: Don't assume TASK_RUNNING

    We're going to make might_sleep() test for TASK_RUNNING, because
    blocking without TASK_RUNNING will destroy the task state by setting
    it to TASK_RUNNING.

    There are a few occasions where its 'valid' to call blocking
    primitives (and mutex_lock in particular) and not have TASK_RUNNING,
    typically such cases are right before we set TASK_RUNNING anyhow.

    Robustify the code by not assuming this; this has the beneficial side
    effect of allowing optional code emission for fixing the above
    might_sleep() false positives.

    Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
    Cc: t...@linutronix.de
    Cc: ilya.dryo...@inktank.com
    Cc: umgwanakikb...@gmail.com
    Cc: Oleg Nesterov <o...@redhat.com>
    Cc: Linus Torvalds <torva...@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20140924082241.988560...@infradead.org
    Signed-off-by: Ingo Molnar <mi...@kernel.org>

diff --git a/kernel/locking/mutex.c b/kernel/locking/mutex.c
index dadbf88..4541951 100644
--- a/kernel/locking/mutex.c
+++ b/kernel/locking/mutex.c
@@ -378,8 +378,14 @@ done:
         * reschedule now, before we try-lock the mutex. This avoids getting
         * scheduled out right after we obtained the mutex.
         */
-       if (need_resched())
+       if (need_resched()) {
+               /*
+                * We _should_ have TASK_RUNNING here, but just in case
+                * we do not, make it so, otherwise we might get stuck.
+                */
+               __set_current_state(TASK_RUNNING);
                schedule_preempt_disabled();
+       }

        return false;
 }


Unfortunately, the code around this was changed after 3.16, making a backport
non-trivial.

A workaround (for jessie systems) is to not install lvm2 if that is an option.

Lucas


-- Package-specific info:
** Version:
Linux version 3.16.0-4-amd64 (debian-ker...@lists.debian.org) (gcc version 
4.8.4 (Debian 4.8.4-1) ) #1 SMP Debian 3.16.36-1+deb8u1 (2016-09-03)

** Command line:
root=/dev/sda3 console=tty0 console=ttyS0,115200

** Not tainted

** Model information
sys_vendor: Dell Inc.
product_name: PowerEdge R630
product_version:
chassis_vendor: Dell Inc.
chassis_version:
bios_vendor: Dell Inc.
bios_version: 1.3.6
board_vendor: Dell Inc.
board_name: 0CNCJW
board_version: A08

** Loaded modules:
x86_pkg_temp_thermal
intel_powerclamp
ttm
drm_kms_helper
intel_rapl
coretemp
kvm_intel
kvm
crc32_pclmul
aesni_intel
aes_x86_64
lrw
gf128mul
glue_helper
ablk_helper
cryptd
evdev
pcspkr
dcdbas
iTCO_wdt
ipmi_devintf
iTCO_vendor_support
drm
ipmi_si
ipmi_msghandler
mei_me
mei
lpc_ich
shpchp
processor
mfd_core
thermal_sys
wmi
acpi_power_meter
button
autofs4
ext4
crc16
mbcache
jbd2
sg
sd_mod
crc_t10dif
crct10dif_generic
ahci
igb
i2c_algo_bit
ehci_pci
libahci
ixgbe
i2c_core
ehci_hcd
libata
megaraid_sas
dca
crct10dif_pclmul
crct10dif_common
ptp
crc32c_intel
usbcore
pps_core
usb_common
mlx4_core
mdio
scsi_mod

** PCI devices:
not available

** USB devices:
not available

-- System Information:
Debian Release: 8.6
  APT prefers stable
  APT policy: (500, 'stable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.16.0-4-amd64 (SMP w/32 CPU cores)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8) (ignored: LC_ALL 
set to en_US.UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)

Versions of packages linux-image-3.16.0-4-amd64 depends on:
ii  debconf [debconf-2.0]                   1.5.56
ii  initramfs-tools [linux-initramfs-tool]  0.120+deb8u2
ii  kmod                                    18-3
ii  linux-base                              3.5

Versions of packages linux-image-3.16.0-4-amd64 recommends:
pn  firmware-linux-free  <none>
pn  irqbalance           <none>

Versions of packages linux-image-3.16.0-4-amd64 suggests:
pn  debian-kernel-handbook  <none>
ii  extlinux                3:6.03+dfsg-5+deb8u1
pn  linux-doc-3.16          <none>

Versions of packages linux-image-3.16.0-4-amd64 is related to:
pn  firmware-atheros        <none>
ii  firmware-bnx2           0.43
ii  firmware-bnx2x          0.43
pn  firmware-brcm80211      <none>
pn  firmware-intelwimax     <none>
pn  firmware-ipw2x00        <none>
pn  firmware-ivtv           <none>
pn  firmware-iwlwifi        <none>
pn  firmware-libertas       <none>
pn  firmware-linux          <none>
pn  firmware-linux-nonfree  <none>
pn  firmware-myricom        <none>
pn  firmware-netxen         <none>
pn  firmware-qlogic         <none>
pn  firmware-ralink         <none>
pn  firmware-realtek        <none>
pn  xen-hypervisor          <none>

-- debconf information:
  linux-image-3.16.0-4-amd64/postinst/mips-initrd-3.16.0-4-amd64:
  linux-image-3.16.0-4-amd64/prerm/removing-running-kernel-3.16.0-4-amd64: true
  linux-image-3.16.0-4-amd64/postinst/depmod-error-initrd-3.16.0-4-amd64: false

Reply via email to