[Kernel-packages] [Bug 1998838] [NEW] Azure: Jammy fio test hangs, swiotlb buffers exhausted

Tim Gardner Mon, 05 Dec 2022 09:45:48 -0800

Public bug reported:

SRU Justification


[Impact]
Hello Canonical Team,

This issue was found while doing the validation on CPC's Jammy CVM
image. We are up against a tight timeline to deliver this to a partner
on 10/5. Would appreciate prioritizing this.

While running fio, the command fails to exit after 2 minutes. I watched
`top` as the command hung and I saw kworkers getting blocked.

sudo fio --ioengine=libaio --bs=4K
--filename=/dev/sdc1:/dev/sdd1:/dev/sde1:/dev/sdf1:/dev/sdg1:/dev/sdh1:/dev/sdi1:/dev/sdj1:/dev/sdk1:/dev/sdl1:/dev/sdm1:/dev/sdn1:/dev/sdo1:/dev/sdp1:/dev/sdq1:/dev/sdr1
--readwrite=randwrite --runtime=120 --iodepth=1 --numjob=96
--name=iteration9 --direct=1 --size=8192M --group_reporting
--overwrite=1


Example system logs:
---------------------------------------------------------------------------------------------------------------
[ 1096.297641] INFO: task kworker/u192:0:8 blocked for more than 120 seconds.
[ 1096.302785] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
[ 1096.306312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1096.310489] INFO: task jbd2/sda1-8:1113 blocked for more than 120 seconds.
[ 1096.313900] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
[ 1096.317481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[ 1096.324117] INFO: task systemd-journal:1191 blocked for more than 120 
seconds.
[ 1096.331219] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
[ 1096.335332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
---------------------------------------------------------------------------------------------------------------
---------------------------------------------------------------------------------------------------------------
[ 3241.013230] systemd-udevd[1221]: sdl1: Worker [6686] processing SEQNUM=13323 
killed
[ 3261.492691] systemd-udevd[1221]: sdl1: Worker [6686] failed
---------------------------------------------------------------------------------------------------------------

TOP report:
---------------------------------------------------------------------------------------------------------------
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
417 root 20 0 0 0 0 R 66.2 0.0 0:34.61 ksoftirqd/59
435 root 20 0 0 0 0 I 24.5 0.0 0:09.03 kworker/59:1-mm_percpu_wq
416 root rt 0 0 0 0 S 23.5 0.0 0:01.86 migration/59
366 root 0 -20 0 0 0 I 19.2 0.0 0:16.64 kworker/49:1H-kblockd
378 root 0 -20 0 0 0 I 17.9 0.0 0:15.71 kworker/51:1H-kblockd
455 root 0 -20 0 0 0 I 17.9 0.0 0:14.76 kworker/62:1H-kblockd
135 root 0 -20 0 0 0 I 17.5 0.0 0:13.08 kworker/17:1H-kblockd
420 root 0 -20 0 0 0 I 16.9 0.0 0:14.63 kworker/58:1H-kblockd
...
---------------------------------------------------------------------------------------------------------------


LISAv3 Testcase: perf_premium_datadisks_4k
Image : "canonical-test 0001-com-ubuntu-confidential-vm-jammy-preview 
22_04-lts-cvm latest"
VMSize : "Standard_DC96ads_v5"

For repro-ability, I am seeing this every time I run the storage perf
tests. It always seems to happen on iteration 9 or 10. When running
manually, I had to run the command three or four times to reproduce the
issue.

[Test Case]

Microsoft tested, requires lots of cores (96) and disks (16)

[Where things could go wrong]

swiotlb buffers could be double freed.

[Other Info]

SF: #00349781

** Affects: linux-azure (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: linux-azure (Ubuntu Jammy)
     Importance: Critical
     Assignee: Tim Gardner (timg-tpi)
         Status: In Progress

** Affects: linux-azure (Ubuntu Kinetic)
     Importance: Undecided
         Status: New

** Package changed: linux (Ubuntu) => linux-azure (Ubuntu)

** Also affects: linux-azure (Ubuntu Kinetic)
   Importance: Undecided
       Status: New

** Also affects: linux-azure (Ubuntu Jammy)
   Importance: Undecided
       Status: New

** Changed in: linux-azure (Ubuntu Jammy)
   Importance: Undecided => Critical

** Changed in: linux-azure (Ubuntu Jammy)
       Status: New => In Progress

** Changed in: linux-azure (Ubuntu Jammy)
     Assignee: (unassigned) => Tim Gardner (timg-tpi)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1998838

Title:
  Azure: Jammy fio test hangs, swiotlb buffers exhausted

Status in linux-azure package in Ubuntu:
  New
Status in linux-azure source package in Jammy:
  In Progress
Status in linux-azure source package in Kinetic:
  New

Bug description:
  SRU Justification

  [Impact]
  Hello Canonical Team,

  This issue was found while doing the validation on CPC's Jammy CVM
  image. We are up against a tight timeline to deliver this to a partner
  on 10/5. Would appreciate prioritizing this.

  While running fio, the command fails to exit after 2 minutes. I
  watched `top` as the command hung and I saw kworkers getting blocked.

  sudo fio --ioengine=libaio --bs=4K
  
--filename=/dev/sdc1:/dev/sdd1:/dev/sde1:/dev/sdf1:/dev/sdg1:/dev/sdh1:/dev/sdi1:/dev/sdj1:/dev/sdk1:/dev/sdl1:/dev/sdm1:/dev/sdn1:/dev/sdo1:/dev/sdp1:/dev/sdq1:/dev/sdr1
  --readwrite=randwrite --runtime=120 --iodepth=1 --numjob=96
  --name=iteration9 --direct=1 --size=8192M --group_reporting
  --overwrite=1

  
  Example system logs:
  
---------------------------------------------------------------------------------------------------------------
  [ 1096.297641] INFO: task kworker/u192:0:8 blocked for more than 120 seconds.
  [ 1096.302785] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
  [ 1096.306312] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 1096.310489] INFO: task jbd2/sda1-8:1113 blocked for more than 120 seconds.
  [ 1096.313900] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
  [ 1096.317481] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  [ 1096.324117] INFO: task systemd-journal:1191 blocked for more than 120 
seconds.
  [ 1096.331219] Tainted: G W 5.15.0-1024-azure #30-Ubuntu
  [ 1096.335332] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.
  
---------------------------------------------------------------------------------------------------------------
  
---------------------------------------------------------------------------------------------------------------
  [ 3241.013230] systemd-udevd[1221]: sdl1: Worker [6686] processing 
SEQNUM=13323 killed
  [ 3261.492691] systemd-udevd[1221]: sdl1: Worker [6686] failed
  
---------------------------------------------------------------------------------------------------------------

  TOP report:
  
---------------------------------------------------------------------------------------------------------------
  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
  417 root 20 0 0 0 0 R 66.2 0.0 0:34.61 ksoftirqd/59
  435 root 20 0 0 0 0 I 24.5 0.0 0:09.03 kworker/59:1-mm_percpu_wq
  416 root rt 0 0 0 0 S 23.5 0.0 0:01.86 migration/59
  366 root 0 -20 0 0 0 I 19.2 0.0 0:16.64 kworker/49:1H-kblockd
  378 root 0 -20 0 0 0 I 17.9 0.0 0:15.71 kworker/51:1H-kblockd
  455 root 0 -20 0 0 0 I 17.9 0.0 0:14.76 kworker/62:1H-kblockd
  135 root 0 -20 0 0 0 I 17.5 0.0 0:13.08 kworker/17:1H-kblockd
  420 root 0 -20 0 0 0 I 16.9 0.0 0:14.63 kworker/58:1H-kblockd
  ...
  
---------------------------------------------------------------------------------------------------------------

  
  LISAv3 Testcase: perf_premium_datadisks_4k
  Image : "canonical-test 0001-com-ubuntu-confidential-vm-jammy-preview 
22_04-lts-cvm latest"
  VMSize : "Standard_DC96ads_v5"

  For repro-ability, I am seeing this every time I run the storage perf
  tests. It always seems to happen on iteration 9 or 10. When running
  manually, I had to run the command three or four times to reproduce
  the issue.

  [Test Case]

  Microsoft tested, requires lots of cores (96) and disks (16)

  [Where things could go wrong]

  swiotlb buffers could be double freed.

  [Other Info]

  SF: #00349781

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1998838/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : [email protected]
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1998838] [NEW] Azure: Jammy fio test hangs, swiotlb buffers exhausted

Reply via email to