Your message dated Thu, 20 Feb 2025 13:24:44 +0100 (CET)
with message-id <[email protected]>
and subject line Closing this bug (BTS maintenance for src:linux bugs)
has caused the Debian Bug report #979740,
regarding Processes blocked in io_schedule when using disks behind JMB575 JBOD
to be marked as done.

This means that you claim that the problem has been dealt with.
If this is not the case it is now your responsibility to reopen the
Bug report if necessary, and/or fix the problem forthwith.

(NB: If you are a system administrator and have no idea what this
message is talking about, this may indicate a serious mail system
misconfiguration somewhere. Please contact [email protected]
immediately.)


-- 
979740: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=979740
Debian Bug Tracking System
Contact [email protected] with problems
--- Begin Message ---
Package: linux-image-armmp
Version: 5.9.15-1

Dear Maintainer,
since upgrading to the 5.9.0-x-armmp armhf kernels I am seeing
processes getting getting randomly stuck in D-state.
I observed this issue with 5.9.0-1-armmp and 5.9.0-4-armmp.
I did not observe the issue with 5.8.0-3-armmp or any previous kernel.

Caveat: the system is running a debian kernel, but userspace is ubuntu focal. I thought you might want to know anyway, since it looks like a kernel problem.

--

Hardware:
Three HDDs connected via JMB575 SataIII port multiplier to i.MX6q's SataII port. Probe log: ata1.15: Port Multiplier 1.2, 0x197b:0x5755 r0, 5 ports, feat 0x5/0xf

Steps to reproduce: Start a bigger filecopy operation from one disk to another. I can reproduce this problem copying multi-GB files either via cp or rsync. Copy gets blocked in D-state in few hours and a cpu core is filled by io-wait. I've seen rsync getting blocked both reading the source and writing destination.

Details:
As I said, JMB575 PMP is connected to the i.MX6q on-chip SATA controller. All it's ports get configured "SATA link up 3.0 Gbps (SStatus 123 SControl 320)" There are three HDDs connected to it - I am going to call them sda, sdb, sdc. Disks "sda" and "sdb" are partitioned and used for regular ext4 filesystems. Disk "sdc" is backing a dm-integrity device. I am going to call it "dm-sdc".
DM device "dm-sdc" is used for a ext4 filesystem.


Generally the blocked processes arrive via io_schedule or io_schedule_timeout: [<c0c9c5f0>] (io_schedule_timeout) from [<c0c9e20c>] (wait_for_completion_io+0xb4/0x134)
 [<c0c9d05c>] (schedule) from [<c0c9d1d8>] (io_schedule+0x50/0x70)
[<c0c9d188>] (io_schedule) from [<bf640a34>] (dm_integrity_map_continue+0xb28/0xbc0 [dm_integrity]) [<c0c9d188>] (io_schedule) from [<c04eb868>] (wait_on_page_bit_common+0x18c/0x420)
 [<c0c9d188>] (io_schedule) from [<c0704924>] (rq_qos_wait+0x120/0x18c)
 [<c0c9d188>] (io_schedule) from [<c0c9da68>] (bit_wait_io+0x1c/0x68)
[<c0ca11c8>] (schedule_timeout) from [<c0c9c648>] (io_schedule_timeout+0x58/0x78)

I have many full stacktraces, but I am not sure if they are safe to share.
I'd really like to avoid disclosing user data or crypto keys.
I managed to find one that should be safe:
task:cronjob1l.sh state:D stack: 0 pid:28125 ppid: 28124 flags:0x00000000
Backtrace:
[<c0c9c668>] (__schedule) from [<c0c9d0d4>] (schedule+0x78/0xf4)
r10:000000c3 r9:00000002 r8:00000002 r7:c0c9da4c r6:e5a27ae8 r5:e5a26000
 r4:e598be00
[<c0c9d05c>] (schedule) from [<c0c9d1d8>] (io_schedule+0x50/0x70)
 r5:c1304a84 r4:00000000
[<c0c9d188>] (io_schedule) from [<c0c9da68>] (bit_wait_io+0x1c/0x68)
 r5:c1304a84 r4:00000002
[<c0c9da4c>] (bit_wait_io) from [<c0c9d5f8>] (__wait_on_bit+0x70/0xc8)
 r5:c1304a84 r4:e5a27adc
[<c0c9d588>] (__wait_on_bit) from [<c0c9d6f4>] (out_of_line_wait_on_bit+0xa4/0xcc)
 r9:e5a27c30 r8:bf1562a8 r7:00000001 r6:00000000 r5:c5fe43a8 r4:e5a27af4
[<c0c9d650>] (out_of_line_wait_on_bit) from [<c05c6f30>] (__wait_on_buffer+0x40/0x44)
 r4:c00b3b80
[<c05c6ef0>] (__wait_on_buffer) from [<bf109de4>] (ext4_bread+0x124/0x130 [ext4])
 r5:c5fe43a8 r4:c00b3b80
[<bf109cc0>] (ext4_bread [ext4]) from [<bf123f84>] (__ext4_read_dirblock+0x3c/0x45c [ext4])
 r4:e5a27cc0
[<bf123f48>] (__ext4_read_dirblock [ext4]) from [<bf1243f8>] (dx_probe+0x54/0x6d8 [ext4]) r10:000000c3 r9:e5a27c30 r8:00000000 r7:e5a27cc0 r6:d4a5024c r5:c5fe43a8
 r4:e5a27cc0
[<bf1243a4>] (dx_probe [ext4]) from [<bf1262e8>] (__ext4_find_entry+0x418/0x620 [ext4]) r10:000000c3 r9:00000000 r8:00000009 r7:c5fe43a8 r6:d4a5024c r5:e3092800
 r4:e5a27cc0
[<bf125ed0>] (__ext4_find_entry [ext4]) from [<bf126c24>] (ext4_lookup+0xe4/0x2ec [ext4]) r10:000000c3 r9:00000000 r8:00000009 r7:c5fe43a8 r6:c5fe43a8 r5:d4a50220
 r4:e5a27ca0
[<bf126b40>] (ext4_lookup [ext4]) from [<c058d328>] (__lookup_slow+0x98/0x164)
 r9:00000000 r8:00000009 r7:c5fe43a8 r6:e5a27db8 r5:e40df990 r4:d4a50220
[<c058d290>] (__lookup_slow) from [<c05900b8>] (walk_component+0x158/0x1cc)
 r8:00000009 r7:c5fe4430 r6:00000001 r5:e5a27db0 r4:e40df990
[<c058ff60>] (walk_component) from [<c05905a8>] (path_lookupat+0x80/0x1c0)
 r9:00000000 r8:e5a27db0 r7:e5a27ea4 r6:00000001 r5:e5a27db0 r4:00000000
[<c0590528>] (path_lookupat) from [<c0593454>] (filename_lookup+0xb0/0x1ac)
 r7:e5a27ea4 r6:00000001 r5:e5509000 r4:00000001
[<c05933a4>] (filename_lookup) from [<c059369c>] (user_path_at_empty+0x7c/0x98)
 r8:00000800 r7:014c04c4 r6:e5a27ea4 r5:ffffff9c r4:00000001
[<c0593620>] (user_path_at_empty) from [<c0585718>] (vfs_statx+0x7c/0x13c)
 r7:014c04c4 r6:ffffff9c r5:00000001 r4:e5a27ee8
[<c058569c>] (vfs_statx) from [<c0585cf4>] (__do_sys_stat64+0x40/0x80)
r10:000000c3 r9:e5a26000 r8:c03002c4 r7:000000c3 r6:014c051c r5:004c2384
 r4:be94d758
[<c0585cb4>] (__do_sys_stat64) from [<c058610c>] (sys_stat64+0x18/0x1c)
 r4:00000004
[<c05860f4>] (sys_stat64) from [<c03000c0>] (ret_fast_syscall+0x0/0x54)
Exception stack(0xe5a27fa8 to 0xe5a27ff0)
7fa0: 00000004 004c2384 014c04c4 be94d758 be94d758 00000003 7fc0: 00000004 004c2384 014c051c 000000c3 014c0518 00000003 004b0204 004c24ec
7fe0: 000000c3 be94d6e4 b6ee69db b6e6dbe6


What information would be helpful to collect next time this happens?
My attempt to document these mis-behavior follows:

1st occurence:  5.9.0-1-armmp #1 Debian 5.9.1-1
        One recursive copy (cp -ar) reading from dm-sdc, writing to sdb3.
        uninterruptibly sleeping / D-state'd processes:
                cp -ar                                  # no info
                rest unknown                            # no info
        kernel messages:
                one for each hung task
        remediation attempts:
                ls /mnt/dm-sdc                          # un-blocked the system
                note: sdc had to spin up because it was parked

2nd occurence:  5.9.0-1-armmp #1 Debian 5.9.1-1
        One recursive copy (cp -ar) reading from dm-sdc, writing to sdb3.
        uninterruptibly sleeping / D-state'd processes:
                cp -ar                                  # no info
                rest unknown                            # no info
        kernel messages:
                NONE
        remediation attempts:
                ls /home/user                           # !! entered D-state
                ls /                                            # no problem
                ls /opt                                 # no problem
                mplayer --help                          # no problem
                mount                                           # no problem
                /usr/games/caesar                               # no problem
                ls /tmp                                 # no problem
                ls /mnt/sdb3                            # !! entered D-state
                ls /mnt/sda2                            # !! entered D-state
                ls /mnt/dm-sdc                          # un-blocked the system
                note: sdc had to spin up because it was parked

3rd occurence:  5.9.0-1-armmp #1 Debian 5.9.1-1
        One rsync reading from sda2, writing to dm-sdc.
        uninterruptibly sleeping / D-state'd processes:
                rsync(s)                                        # one or both 
rsyncs?
                rest unknown                            # no info
        kernel messages:
                NONE
        remediation attempts:
                ls /proc                                        # no problem
                ls /sys                                 # no problem
                ls /dev                                 # no problem
                ls /run                                 # [tmpfs] no problem
                ls /mnt/sdb3                            # no problem
                ls /mnt/sda2                            # no problem
                ls /home/user                           # no problem
                ls /mnt/dm-sdc                          # un-blocked the system
                note: sdc had to spin up because it was parked

4th occurence:  5.9.0-1-armmp #1 Debian 5.9.1-1
        One rsync reading from sda2, writing to dm-sdc.
        uninterruptibly sleeping / D-state'd processes:
                kworker/3:13+dm-integrity-writer        # dm-sdc
                rsync                                           # only one, but 
which one?
                kworker/u8:0+dm-integrity-wait  # dm-sdc
                kworker/0:0+dm-integrity-commit # dm-sdc
                jbd2/dm-0-8                                     # dm-sdc
                /lib/systemd/systemd-journald           # no info
        kernel messages:
                NONE
        remediation attempts:
                ls /mnt/dm-sdc                          # no problem
                ls /mnt/dm-sdc/folder                   # un-blocked the system
                note: sdc had to spin up because it was parked

<kernel upgraded, system rebooted>

5th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
        One rsync -c --dry-run reading from all three disks. No writing!
        uninterruptibly sleeping / D-state'd processes:
                rsync(s)                                        # no info
                kworker/3:1+dm-integrity-offload        # dm-sdc
                kworker/3:7+dm-integrity-offload        # dm-sdc
                rest unknown                            # no futher info
        kernel messages:
                one for each hung task
        remediation attempts:
                ls /mnt/dm-sdc                          # !! entered D-state
        uninterruptibly sleeping / D-state'd processes:
                rsync(s)                                        # no info
                kworker/3:1+dm-integrity-offload        # dm-sdc
                kworker/3:7+dm-integrity-offload        # dm-sdc
                kworker/0:0+dm-integrity-offload        # dm-sdc, !! NEW
                rest unknown                            # no info
        remediation attempts:
                ls /home/user                           # no problem
                sha256sum /dev/sdc                      # !! no help, killed by 
Ctrl+C
                dd if=/dev/sdc                          # !! entered D-state
                smartctl /dev/sdc                               # !! entered 
D-state
                umount /mnt/sdb3                                # failed, rsync 
has a file open
                hdparm -Y /dev/sda                      # un-blocked the system
                NOTE: sdc did NOT seem parked this time

6th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
One rsync -c --dry-run thoroughly verifying previous copies - no writing!
        uninterruptibly sleeping / D-state'd processes:
                rsync - reading side                    # sdb3, reading 400M 
file
                rsync - comparing side                  # dm-sdc,reading 900M 
file
                kworker/2:16+dm-integrity-offload       # dm-sdc
                kworker/2:0+dm-integrity-offload        # dm-sdc
                jbd2/sda2-8                                     # sda2
                /bin/sh writing to file on sda2 # sda2
                /bin/sh creating a file on sda2 # sda2
        kernel messages:
                "failed to read SCR 1" for all PMP ports
                IO errors on sda
        remediation attempts:
                df -h                                           # no problem
                ls /mnt/dm-sdc                          # !! entered D-state
                ls /home/user                           # un-blocked some, but 
not all!?
        uninterruptibly sleeping / D-state'd processes:
                rsync - comparing side                  # dm-sdc, reading 900M 
file
                kworker/2:16+dm-integrity-offload       # dm-sdc
                kworker/2:0+dm-integrity-offload        # dm-sdc
                kworker/3:0+dm-integrity-offload        # dm-sdc
                ls /mnt/dm-sdc                          # dm-sdc
        remediation attempts:
                ls /mnt/sdb3/path1                      # no problem
                ls /mnt/sdb3/path2                      # no problem
                md5sum /mnt/sdb3/file1                  # no problem
                md5sum /mnt/sdb3/file2                  # no problem
                echo x >/proc/sysrq-trigger          # no problem
        kernel messages:
                one for each hung task
        remediation attempts:
                # none, system unblocked itself after a while!
        kernel messages:
                "failed to read SCR 1" for all PMP ports
                IO errors on sda

<system rebooted>

7th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
        One rsync reading from sda2, writing to dm-sdc.
        uninterruptibly sleeping / D-state'd processes:
                jbd2/dm-0-8                                     # dm-sdc
                rsync - writing side                    # dm-sdc, writing >100G 
file
                kworker/2:0+dm-integrity-writer # dm-sdc
                kworker/0:0+dm-integrity-commit # dm-sdc
                kworker/u8:2+flush-254:0                # dm-0 -> dm-sdc
                kworker/u8:1+dm-integrity-wait  # dm-sdc
                kworker/3:1+dm_bufio_cache              # dm-sdc
                kworker/0:3+dm_bufio_cache              # dm-sdc
        kernel messages:
                one for each hung task
        remediation attempts:
                df -h                                           # no problem
                ls /mnt/dm-sdc                          # un-blocked the system
                note: sdc had to spin up because it was parked

8th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
        Happened just minutes after "solving" #7
        uninterruptibly sleeping / D-state'd processes:
                jbd2/dm-0-8                                     # dm-sdc
                631/rsync - writing side                # dm-sdc
                kworker/1:3+dm-integrity-writer # dm-sdc
                kworker/3:2+dm-integrity-commit # dm-sdc
                kworker/u8:2+flush-254:0                # dm-0 -> dm-sdc
                kworker/u8:1+dm-integrity-wait  # dm-sdc
        kernel messages:
                NONE
        remediation attempts:
                df -h                                           # no problem
                ls /mnt/dm-sdc                          # no problem
                ls /mnt/dm-sdc/dir1                     # no problem
                ls /mnt/dm-sdc/dir1/d2/                 # un-blocked the system
                note: not sure if sdc was spinning up or not

9th occurence:  5.9.0-4-armmp #1 Debian 5.9.11-1
Happened within tens of minutes after solving #8 (rsyncing the same file!)
        uninterruptibly sleeping / D-state'd processes:
                629/rsync - reading side                # sda2
                kworker/1:0+dm-integrity-commit # dm-sdc
                kworker/1:1+dm-integrity-writer # dm-sdc
                jbd2/dm-0-8                                     # dm-sdc
                kworker/0:0+dm_bufio_cache              # dm-sdc
                /bin/sh - cronjob1 writing a file       # ?
                /bin/sh - cronjob2 writing a file       # ?
        kernel messages:
                NONE
        remediation attempts:
                df -h                                           # no problem
                ls /mnt/dm-sdc                          # no problem
                ls /mnt/dm-sdc/dir1                     # no problem
                ls /mnt/dm-sdc/dir1/d2/                 # no problem
                ls /mnt/dm-sdc/dir1/d2/d3/d4/           # un-blocked the system
                note: not sure if sdc was spinning up or not

10th, 11th, 12th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
        Still rsyncing that the same >100G file
always remediated by 'find /mnt/sda & find /mnt/sdb & find /mnt/dm-sdc &'

13th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
        Still rsyncing that one >100G file
        uninterruptibly sleeping / D-state'd processes:
                629/rsync - reading side                # sda2
                jbd2/dm-0-8                                     # dm-sdc
                kworker/2:15+dm-integrity-commit        # dm-sdc
                kworker/2:8+dm-integrity-writer # dm-sdc
                jbd2/sda2-8                                     # sda2
                kworker/u8:1+flush-8:0                  # ?
                /bin/sh - cronjob0 writing a file       # started from sda2
                /bin/sh - cronjob0 writing a file       # started from sda2
        The blocked shellscripts are odd. Their open FDs look like this:
                0 -> pipe:[7750528]
                1 -> /tmp/#713 (deleted)
                10 -> /mnt/sda2/dir/cronjob0.sh
                2 -> /tmp/#713 (deleted)
        kernel messages:
                one for each hung task
        remediation attempts:
                find /mnt/sda2                          # !! entered D-state
                find /mnt/sdb3                          # !! entered D-state
                find /mnt/dm-sdc                                # un-blocked 
the system
                note: sdc had to spin up because it was parked

13th occurence: 5.9.0-4-armmp #1 Debian 5.9.11-1
        Still rsyncing that one >100G file
        uninterruptibly sleeping / D-state'd processes:
                629/rsync - reading side                # sda2
                jbd2/dm-0-8                                     # dm-sdc
                kworker/2:0+dm-integrity-commit # dm-sdc
                kworker/3:3+dm-integrity-writer # dm-sdc
        kernel messages:
                one for each hung task
        remediation attempts:
                find /mnt/sda2                          # !! entered D-state
                find /mnt/sdb3                          # !! entered D-state
                find /mnt/dm-sdc                                # un-blocked 
the system
                note: not sure if sdc was spinning up or not

Cheers,
n-b-f

--- End Message ---
--- Begin Message ---
Hi

This bug was filed for a very old kernel or the bug is old itself
without resolution.

If you can reproduce it with

- the current version in unstable/testing
- the latest kernel from backports

please reopen the bug, see https://www.debian.org/Bugs/server-control
for details.

Regards,
Salvatore

--- End Message ---

Reply via email to