If I do this:

    # sysctl -w vm.dirty_ratio=0

the hdparm no longer hangs.  It has to be zero; anything non-zero, even
1, causes large delays in disk-related commands such as hdparm, sync,
smartctl, etc.

I got this idea from here:

https://serverfault.com/questions/405210/can-high-load-cause-server-
hang-and-error-blocked-for-more-than-120-seconds

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-meta-hwe-5.15 in Ubuntu.
https://bugs.launchpad.net/bugs/2018687

Title:
  rm -r dir on USB disk locks up hdparm on different disk

Status in linux-meta-hwe-5.15 package in Ubuntu:
  New

Bug description:
  Description:    Ubuntu 20.04.6 LTS
  Linux 5.15.0-72-generic #79~20.04.1-Ubuntu SMP Thu Apr 20 22:12:07 UTC 2023 
x86_64 GNU/Linux
  Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz

  Running a "rm -r dir" on a directory with millions of files that resides
  on a disk in an external USB-3 hard drive dock locks up an unrelated
  hdparm processes running on an internal disk such that the kernel says:

      May  7 04:24:02 kernel: [163606.041862] INFO: task hdparm:1391162 blocked 
for more than 120 seconds.
      [...]
      May  7 04:26:03 kernel: [163726.876357] INFO: task hdparm:1391162 blocked 
for more than 241 seconds.
      [...]
      May  7 04:28:04 kernel: [163847.702980] INFO: task hdparm:1391162 blocked 
for more than 362 seconds.

  First a normal run of "hdparm -t /dev/sda" with the offending "rm -r" 
SIGSTOPped so that
  it doesn't affect anything:

      # \time hdparm -t /dev/sda
      /dev/sda:
       Timing buffered disk reads: 1128 MB in  3.00 seconds = 375.50 MB/sec
      0.01user 0.67system 0:06.21elapsed 11%CPU (0avgtext+0avgdata 
4584maxresident)k
      2312704inputs+8outputs (0major+664minor)pagefaults 0swaps

  Elapsed time is about six seconds, as expected.  /dev/sda is an
  internal SSD drive.

  I now run this loop to show the timings and process states below:

      # while sleep 1 ; do  date ; ps laxww | grep '[ ]D' | grep -v
  refrig ; done

  (I have some processes stopped in a freezer cgroup ("refrig") that I
  don't want to see in the grep output.)  I SIGCONT the offending "rm -r"
  running on the drive in the USB3 drive dock and you see the rm appear
  in uninterruptible sleep along with a couple of kernel processes:

      Sun May  7 05:01:07 EDT 2023
      Sun May  7 05:01:08 EDT 2023
      Sun May  7 05:01:09 EDT 2023
      Sun May  7 05:01:10 EDT 2023
      Sun May  7 05:01:11 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      4     0 1423283   11939  20   0  10648   580 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      Sun May  7 05:01:12 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      1     0    4016       1  20   0 161136  1900 usbhid Dsl  ?          1:39 
/sbin/apcupsd
      4     0 1423283   11939  20   0  10648   580 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      Sun May  7 05:01:13 EDT 2023

  The above lines showing those processes in uninterruptible sleep repeat
  over and over each second as the "rm -r" continues.  I then start up
  "hdparm -t /dev/sda" on the internal SSD disk, and it also appears in
  uninterruptible sleep and doesn't finish even after minutes of waiting:

      Sun May  7 05:01:25 EDT 2023
      1     0     368       2  20   0      0     0 md_sup D    ?          2:57 
[md0_raid5]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:26 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:27 EDT 2023
  [...]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:35 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      4     0 1423283   11939  20   0  11260  2544 blk_mq D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:36 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:17 
[usb-storage]
      1     0    4985       2  20   0      0     0 rq_qos D    ?          0:24 
[jbd2/sdj1-8]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:00 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:01:37 EDT 2023

  I keep waiting.  The above lines repeat over and over and the hdparm is
  blocked and doesn't finish.

      Sun May  7 05:03:32 EDT 2023
      1     0     447       2  20   0      0     0 usb_sg D    ?          3:18 
[usb-storage]
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423283   11939  20   0  11260  2544 wait_o D+   pts/28     0:03 
rm -rf 15tb3
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:34 EDT 2023

  At this point, I SIGSTOP the offending "rm -r" and it disappears from
  uninterruptible sleep, but the other processes are still in it for
  several seconds until the lockup clears and the hdparm finally finishes:

      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:35 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:36 EDT 2023
      1     0    4016       1  20   0 161136  1900 usbhid Dsl  ?          1:39 
/sbin/apcupsd
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:37 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:38 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:39 EDT 2023
      1     0 1366783       2  20   0      0     0 blk_mq D    ?          0:02 
[kworker/u16:2+flush-8:144]
      4     0 1423501    9975  20   0   4680  4584 wb_wai DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:40 EDT 2023
      Sun May  7 05:03:41 EDT 2023
      Sun May  7 05:03:42 EDT 2023
      Sun May  7 05:03:43 EDT 2023
      4     0 1423501    9975  20   0   4680  4584 wait_o DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:44 EDT 2023
      4     0 1423501    9975  20   0   4680  4584 wait_o DL+  pts/4      0:00 
hdparm -t /dev/sda
      Sun May  7 05:03:45 EDT 2023
      Sun May  7 05:03:46 EDT 2023

  And the hdparm is finally done, several minutes after I started it.  If I keep
  the offending "rm -r" running on that USB3 disk, the "rm -r" locks up an 
hdparm
  (running on a different disk) long enough to cause the above kernel INFO 
messages.

  If I try hdparm on a different disk, e.g. one of the WD Red drives in
  my internal RAID5 array, it also hangs until I SIGSTOP or kill the "rm -r":

      1     0     447       2  20   0      0     0 usb_sg D    ?          3:51 
[usb-storage]
      1     0    4985       2  20   0      0     0 blk_mq D    ?          0:31 
[jbd2/sdj1-8]
      1     0 1466181       2  20   0      0     0 rq_qos D    ?          0:08 
[kworker/u16:4+flush-8:144]
      4     0 1469821   11939  20   0  27524 18888 wait_o D+   pts/28     0:44 
rm -rf 15tb3
      4     0 1501584   11933  20   0   4680  4584 rq_qos DL+  pts/26     0:00 
hdparm -t /dev/sdb

  While hung, ^C (interrupt) doesn't affect the hdparm until I SIGSTOP
  the offending "rm -r" and then everything eventually unlocks after a few 
seconds.
  Is this the way things are supposed to work?  I did not expect this 
behaviour.  Here's the syslog trace:

      [163968.537842] INFO: task hdparm:1391162 blocked for more than 483 
seconds.
      [163968.537853]       Tainted: G           O      5.15.0-72-generic 
#79~20.04.1-Ubuntu
      [163968.537858] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" 
disables this message.
      [163968.537861] task:hdparm          state:D stack:    0 pid:1391162 
ppid:1391130 flags:0x00004004
      [163968.537869] Call Trace:
      [163968.537872]  <TASK>
      [163968.537876]  __schedule+0x2cd/0x890
      [163968.537889]  schedule+0x69/0x110
      [163968.537896]  wb_wait_for_completion+0x5a/0x90
      [163968.537907]  ? wait_woken+0x60/0x60
      [163968.537928]  sync_inodes_sb+0xd8/0x2b0
      [163968.537932]  sync_inodes_one_sb+0x19/0x20
      [163968.537933]  iterate_supers+0xa3/0x100
      [163968.537937]  ? __x64_sys_tee+0xe0/0xe0
      [163968.537939]  ksys_sync+0x42/0xa0
      [163968.537941]  __do_sys_sync+0xe/0x20
      [163968.537942]  do_syscall_64+0x5c/0xc0
      [163968.537946]  ? do_syscall_64+0x69/0xc0
      [163968.537948]  ? do_syscall_64+0x69/0xc0
      [163968.537950]  ? do_syscall_64+0x69/0xc0
      [163968.537953]  entry_SYSCALL_64_after_hwframe+0x61/0xcb
      [163968.537956] RIP: 0033:0x7f1d374a822b
      [163968.537958] RSP: 002b:00007ffc973592f8 EFLAGS: 00000202 ORIG_RAX: 
00000000000000a2
      [163968.537961] RAX: ffffffffffffffda RBX: 0000000000000d64 RCX: 
00007f1d374a822b
      [163968.537962] RDX: 0000000000000000 RSI: 0000000000200000 RDI: 
0000000000000003
      [163968.537963] RBP: 0000000000000003 R08: 0000000000000007 R09: 
000000000000002a
      [163968.537964] R10: 0000564568bdd017 R11: 0000000000000202 R12: 
0000000000000000
      [163968.537965] R13: 00007ffc9735a295 R14: 00007ffc97359370 R15: 
00007f1d37193000
      [163968.537967]  </TASK>

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: linux-image-generic-hwe-20.04 5.15.0.72.79~20.04.33
  ProcVersionSignature: Ubuntu 5.15.0-72.79~20.04.1-generic 5.15.98
  Uname: Linux 5.15.0-72-generic x86_64
  ApportVersion: 2.20.11-0ubuntu27.26
  Architecture: amd64
  CasperMD5CheckResult: skip
  Date: Sun May  7 08:38:10 2023
  EcryptfsInUse: Yes
  InstallationDate: Installed on 2020-10-07 (941 days ago)
  InstallationMedia: Lubuntu 20.04.1 LTS "Focal Fossa" - Release amd64 
(20200731)
  SourcePackage: linux-meta-hwe-5.15
  UpgradeStatus: Upgraded to focal on 2022-08-15 (265 days ago)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-meta-hwe-5.15/+bug/2018687/+subscriptions


-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to