[Kernel-packages] [Bug 1662673] Re: systemd-udevd hung in blk_mq_freeze_queue_wait testing unpartitioned NVMe drive

Steven Haber Mon, 15 Jan 2018 11:31:38 -0800

This bug is hitting for me on 16.04 LTS running kernel 4.13.0-16. udev
is stuck in the following stack:


[<ffffffff906309eb>] blk_mq_freeze_queue_wait+0x4b/0xb0
[<ffffffff90631f4a>] blk_mq_freeze_queue+0x1a/0x20
[<ffffffffc03d676a>] __nvme_revalidate_disk+0x7a/0x3f0 [nvme_core]
[<ffffffffc03d7bc3>] nvme_revalidate_disk+0x53/0x90 [nvme_core]
[<ffffffff9063b72d>] rescan_partitions+0x8d/0x330
[<ffffffff906374f5>] __blkdev_reread_part+0x65/0x70
[<ffffffff90637523>] blkdev_reread_part+0x23/0x40
[<ffffffff90637ef7>] blkdev_ioctl+0x387/0x910
[<ffffffff9049253d>] block_ioctl+0x3d/0x50
[<ffffffff90467521>] do_vfs_ioctl+0xa1/0x5f0
[<ffffffff90467ae9>] SyS_ioctl+0x79/0x90
[<ffffffff90b0edfb>] entry_SYSCALL_64_fastpath+0x1e/0xa9
[<ffffffffffffffff>] 0xffffffffffffffff

And the process info:

4 D root        797      1  0  80   0 - 11661 blk_mq 03:04 ?
00:00:02 /lib/systemd/systemd-udevd

We have a bunch of read-only parted jobs backing up behind the kernel hang (and 
possibly causing it in the first place):
root      17317      1  0 03:17 ?        00:00:00 /sbin/parted.rw -m -s -- 
/dev/nvme0n1 unit B print
root      36839  36832  0 05:39 ?        00:00:00 /sbin/parted.rw -m -s -- 
/dev/nvme0n1 unit B print
root      37181  37143  0 05:50 ?        00:00:00 /sbin/blockdev --getsize64 
/dev/nvme0n1
root      37340  37333  0 06:00 ?        00:00:00 /sbin/parted.rw -m -s -- 
/dev/nvme0n1 unit B print
root      38585  38549  0 08:29 ?        00:00:00 /sbin/blockdev --getsize64 
/dev/nvme0n1
root      38742  38735  0 08:39 ?        00:00:00 /sbin/parted.rw -m -s -- 
/dev/nvme0n1 unit B print
root      40022  39986  0 11:14 ?        00:00:00 /sbin/blockdev --getsize64 
/dev/nvme0n1
root      40184  40177  0 11:24 ?        00:00:00 /sbin/parted.rw -m -s -- 
/dev/nvme0n1 unit B print
root      41456  41419  0 13:59 ?        00:00:00 /sbin/blockdev --getsize64 
/dev/nvme0n1
root      41615  41608  0 14:09 ?        00:00:00 /sbin/parted.rw -m -s -- 
/dev/nvme0n1 unit B print
root      42905  42869  0 16:44 ?        00:00:00 /sbin/blockdev --getsize64 
/dev/nvme0n1
root      43062  43054  0 16:54 ?        00:00:00 /sbin/parted.rw -m -s -- 
/dev/nvme0n1 unit B print

These are NVME drives with a GPT and two partitions. Let me know if you
need more info.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1662673

Title:
  systemd-udevd hung in blk_mq_freeze_queue_wait testing unpartitioned
  NVMe drive

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Released
Status in linux source package in Yakkety:
  Fix Released
Status in linux source package in Zesty:
  Fix Released

Bug description:
  For reference, here is the stack of systemd-udevd seen in the hang:

  [ 1558.214013] INFO: task systemd-udevd:1778 blocked for more than 120 
seconds.
  [ 1558.214318] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 1558.214556] systemd-udevd   D 00003fff8dbdf7a0     0  1778      1 
0x00040000
  [ 1558.214637] Call Trace:
  [ 1558.214673] [c000000004ad3790] [c0000000007aac20] 
schedule_timeout+0x180/0x2f0 (unreliable)
  [ 1558.214779] [c000000004ad3960] [c0000000000158d0] __switch_to+0x200/0x350
  [ 1558.214870] [c000000004ad39c0] [c0000000007adbb4] __schedule+0x414/0x9e0
  [ 1558.214961] [c000000004ad3a90] [c0000000003b4e54] 
blk_mq_freeze_queue_wait+0x64/0xd0
  [ 1558.215107] [c000000004ad3af0] [d000000034011964] 
nvme_revalidate_disk+0xd4/0x3a0 [nvme]
  [ 1558.215386] [c000000004ad3b90] [c0000000003c2398] 
rescan_partitions+0x98/0x390
  [ 1558.215508] [c000000004ad3c60] [c0000000003bb7ac] 
__blkdev_reread_part+0x9c/0xd0
  [ 1558.215599] [c000000004ad3c90] [c0000000003bb818] 
blkdev_reread_part+0x38/0x70
  [ 1558.215935] [c000000004ad3cc0] [c0000000003bc334] blkdev_ioctl+0x3b4/0xb80
  [ 1558.216016] [c000000004ad3d20] [c0000000002cbcd0] block_ioctl+0x70/0x90
  [ 1558.216114] [c000000004ad3d40] [c000000000296b38] do_vfs_ioctl+0x458/0x740
  [ 1558.216192] [c000000004ad3dd0] [c000000000296ee4] SyS_ioctl+0xc4/0xe0
  [ 1558.216275] [c000000004ad3e30] [c00000000000a17c] system_call+0x38/0xb4

  It appears that systemd-udevd is triggering every time HTX writes to
  the boot sector (partition table) of the raw drive, and this is
  causing the revalidate calls which expose the issue with the block
  driver mq freeze. With a partition table on each drive, HTX will no
  longer be writing the partition table and no longer triggering systemd
  to re-read the partition table and try to freeze I/O.

  The fix for this is provided by the following upstream commit:

  966d2b0 percpu-refcount: fix reference leak during percpu-atomic
  transition

  which needs to be pulled into 16.04 (as well as newer releases).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1662673/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1662673] Re: systemd-udevd hung in blk_mq_freeze_queue_wait testing unpartitioned NVMe drive

Reply via email to