Hello Jens,
If I leave the srp-test software running for a few minutes using the
following command:
# while ~bart/software/infiniband/srp-test/run_tests -d -r 30; do :; done
then after some time the following complaint appears for multiple
kworkers:
INFO: task kworker/9:0:65 blocked for more than 480 seconds.
Tainted: G I 4.11.0-rc4-dbg+ #5
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/9:0 D 0 65 2 0x00000000
Workqueue: dio/dm-0 dio_aio_complete_work
Call Trace:
__schedule+0x3df/0xc10
schedule+0x38/0x90
rwsem_down_write_failed+0x2c4/0x4c0
call_rwsem_down_write_failed+0x17/0x30
down_write+0x5a/0x70
__generic_file_fsync+0x43/0x90
ext4_sync_file+0x2d0/0x550
vfs_fsync_range+0x46/0xa0
dio_complete+0x181/0x1b0
dio_aio_complete_work+0x17/0x20
process_one_work+0x208/0x6a0
worker_thread+0x49/0x4a0
kthread+0x107/0x140
ret_from_fork+0x2e/0x40
I had not yet observed this behavior with kernel v4.10 or older. If this
happens and I check the queue state with the following script:
#!/bin/bash
cd /sys/class/block || exit $?
for dev in *; do
if [ -e "$dev/mq" ]; then
echo "$dev"
for f in "$dev"/mq/*/{pending,*/rq_list}; do
[ -e "$f" ] || continue
if { read -r line1 && read -r line2; } <"$f"; then
echo "$f"
echo "$line1 $line2" >/dev/null
head -n 9 "$f"
fi
done
(
cd /sys/kernel/debug/block >&/dev/null &&
for d in "$dev"/mq/*; do
[ ! -d "$d" ] && continue
grep -q '^busy=0$' "$d/tags" && continue
for f in "$d"/{dispatch,tags*,cpu*/rq_list}; do
[ -e "$f" ] && grep -aH '' "$f"
done
done
)
fi
done
then the following output appears:
sda
sdb
sdc
sdd
sdd/mq/3/dispatch:ffff880401655d00 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=59, .internal_tag=-1}
sdd/mq/3/tags:nr_tags=62
sdd/mq/3/tags:nr_reserved_tags=0
sdd/mq/3/tags:active_queues=0
sdd/mq/3/tags:
sdd/mq/3/tags:bitmap_tags:
sdd/mq/3/tags:depth=62
sdd/mq/3/tags:busy=31
sdd/mq/3/tags:bits_per_word=8
sdd/mq/3/tags:map_nr=8
sdd/mq/3/tags:alloc_hint={23, 23, 52, 1, 55, 29, 17, 22, 34, 48, 25, 49, 37,
43, 58, 25, 6, 20, 50, 14, 55, 7, 21, 17, 26, 36, 43, 43, 4, 6, 3, 47}
sdd/mq/3/tags:wake_batch=7
sdd/mq/3/tags:wake_index=0
sdd/mq/3/tags:ws={
sdd/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sdd/mq/3/tags:}
sdd/mq/3/tags:round_robin=0
sdd/mq/3/tags_bitmap:00000000: ffff ff1f 0000 0018
sdd/mq/3/cpu5/rq_list:ffff880401657440 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=60, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba0000 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=0, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba1740 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=1, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba2e80 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=2, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba45c0 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=3, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba5d00 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=4, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba7440 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=5, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037aba8b80 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=6, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037abaa2c0 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=7, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037ababa00 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=8, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037abad140 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=9, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88037abae880 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=10, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369900000 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=11, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369901740 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=12, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369902e80 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=13, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8803699045c0 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=14, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369905d00 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=15, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369907440 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=16, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff880369908b80 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=17, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88036990a2c0 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=18, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88036990ba00 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=19, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88036990d140 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=20, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff88036990e880 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=21, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b0000 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=22, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b1740 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=23, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b2e80 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=24, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b45c0 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=25, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b5d00 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=26, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b7440 {.cmd_flags=0xca01, .rq_flags=0x2040,
.tag=27, .internal_tag=-1}
sdd/mq/3/cpu5/rq_list:ffff8804001b8b80 {.cmd_flags=0x7a01, .rq_flags=0x2040,
.tag=28, .internal_tag=-1}
sde
sde/mq/3/tags:nr_tags=62
sde/mq/3/tags:nr_reserved_tags=0
sde/mq/3/tags:active_queues=0
sde/mq/3/tags:
sde/mq/3/tags:bitmap_tags:
sde/mq/3/tags:depth=62
sde/mq/3/tags:busy=31
sde/mq/3/tags:bits_per_word=8
sde/mq/3/tags:map_nr=8
sde/mq/3/tags:alloc_hint={23, 23, 52, 1, 55, 29, 17, 22, 34, 48, 25, 49, 37,
43, 58, 25, 6, 20, 50, 14, 55, 7, 21, 17, 26, 36, 43, 43, 4, 6, 3, 47}
sde/mq/3/tags:wake_batch=7
sde/mq/3/tags:wake_index=0
sde/mq/3/tags:ws={
sde/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sde/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sde/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sde/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sde/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sde/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sde/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sde/mq/3/tags: {.wait_cnt=7, .wait=inactive},
sde/mq/3/tags:}
sde/mq/3/tags:round_robin=0
sde/mq/3/tags_bitmap:00000000: ffff ff1f 0000 0018
sdf
sdg
sdh
sdi
sdj
sr0
I am using the "none" scheduler:
# cat /sys/class/block/sdd/queue/scheduler
[none]
# cat /sys/class/block/sde/queue/scheduler
[none]
What is remarkable is that I see pending requests for the sd* devices
but not for any dm* device and also that the number of busy requests (31)
is exactly half of the queue depth (62). Could this indicate that the
block layer stopped processing these blk-mq queues?
If this happens and I run the following command to trigger SRP logout:
# for p in /sys/class/srp_remote_ports/*; do echo 1 >$p/delete; done
then the test that was running finishes, reports that removing the
multipath device failed and echo w >/proc/sysrq-trigger produces the
following output:
sysrq: SysRq : Show Blocked State
task PC stack pid father
systemd-udevd D 0 14490 508 0x00000106
Call Trace:
__schedule+0x3df/0xc10
schedule+0x38/0x90
io_schedule+0x11/0x40
__lock_page+0x10c/0x140
truncate_inode_pages_range+0x45d/0x780
truncate_inode_pages+0x10/0x20
kill_bdev+0x30/0x40
__blkdev_put+0x71/0x220
blkdev_put+0x49/0x170
blkdev_close+0x20/0x30
__fput+0xe8/0x1f0
____fput+0x9/0x10
task_work_run+0x80/0xb0
do_exit+0x30c/0xc70
do_group_exit+0x4b/0xc0
get_signal+0x2c2/0x930
do_signal+0x23/0x670
exit_to_usermode_loop+0x5d/0xa0
do_syscall_64+0xd5/0x140
entry_SYSCALL64_slow_path+0x25/0x25
Does this indicate that truncate_inode_pages_range() is waiting
because a block layer queue got stuck?
The kernel tree I used in my tests is the result of merging the
following commits:
* commit 3dca2c2f3d3b from git://git.kernel.dk/linux-block.git
("Merge branch 'for-4.12/block' into for-next")
* commit f88ab0c4b481 from
git://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
("scsi: libsas: fix ata xfer length")
* commit ad0376eb1483 from
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
("Merge tag 'edac_for_4.11_2' of
git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp")
Please let me know if you need more information.
Thanks,
Bart.