On Fri, 2017-11-03 at 11:50 +0800, Ming Lei wrote:
> On Fri, Nov 03, 2017 at 02:42:50AM +0000, Bart Van Assche wrote:
> > On Fri, 2017-11-03 at 10:12 +0800, Ming Lei wrote:
> > > [root@ibclient srp-test]# ./run_tests
> > > modprobe: FATAL: Module target_core_mod is in use.
> >
> > LIO must be unloaded before srp-test software is started.
>
> Yeah, I can make this test kick off after running
> 'modprobe -fr ib_isert' first, but looks all tests failed without
> any hang, could you check if that is the expected result?
>
> https://pastebin.com/VXe66Jpg
I see plenty of "... >/sys/.../add_target failed: Connection timed out"
messages. The srp-test scripts use the loopback functionality of IB ports
so one or more IB ports must be active. Have you already checked the IB
ports state, e.g. by running ibv_devinfo?
> Also could you provide some output from debugfs when this hang happens?
> such as:
>
> dispatch/busy/.. under hctxN
This is the output that appears on my test setup if the two patches from
the patch series "block: remove unnecessary RESTART" are applied:
# ~bart/software/infiniband/srp-test/run_tests -f xfs -d -e deadline -r 60
Unloaded the ib_srpt kernel module
Zero-initializing /dev/ram0 ... done
Zero-initializing /dev/ram1 ... done
Zero-initializing /dev/disk/by-id/scsi-33333333000004650 ... done
brw-rw---- 1 root disk 1, 0 Nov 3 08:10 /dev/ram0
/dev/ram0
brw-rw---- 1 root disk 1, 1 Nov 3 08:10 /dev/ram1
/dev/ram1
lrwxrwxrwx 1 root root 9 Nov 3 08:11 /dev/disk/by-id/scsi-33333333000004650 ->
../../sdi
/dev/sdi
Configured SRP target driver
Running test /home/bart/software/infiniband/srp-test/tests/01 ...
id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe800000000000000002c90300a02d51,pkey=7fff,service_id=0x0002c90300a02d50,
>/sys/class/infiniband_srp/
srp-mlx4_0-2/add_target failed: Invalid argument
id_ext=0x0002c90300a02d50,ioc_guid=0x0002c90300a02d50,dgid=fe800000000000000002c90300a02d52,pkey=7fff,service_id=0x0002c90300a02d50,
>/sys/class/infiniband_srp/
srp-mlx4_0-2/add_target failed: Invalid argument
count_luns(): 3 <> 3
Error: unloading kernel module ib_srp failed
Test /home/bart/software/infiniband/srp-test/tests/01 failed
# cat /sys/module/scsi_mod/parameters/use_blk_mq
Y
# find /sys/kernel/debug/block -name busy | xargs cat
# dmesg -c >/dev/null; echo w > /proc/sysrq-trigger; dmesg
[ 709.321750] sysrq: SysRq : Show Blocked State
[ 709.327906] task PC stack pid father
[ 709.335910] kworker/4:1 D 0 54 2 0x80000000
[ 709.342887] Workqueue: srp_remove srp_remove_work [ib_srp]
[ 709.349806] Call Trace:
[ 709.353365] __schedule+0x2fa/0xbb0
[ 709.357865] schedule+0x36/0x90
[ 709.361985] async_synchronize_cookie_domain+0x88/0x130
[ 709.368422] ? finish_wait+0x90/0x90
[ 709.373033] async_synchronize_full_domain+0x18/0x20
[ 709.379184] sd_remove+0x4d/0xc0 [sd_mod]
[ 709.384282] device_release_driver_internal+0x160/0x210
[ 709.390752] device_release_driver+0x12/0x20
[ 709.396130] bus_remove_device+0x100/0x180
[ 709.401332] device_del+0x1d8/0x340
[ 709.405823] __scsi_remove_device+0xfc/0x130
[ 709.411221] scsi_forget_host+0x25/0x70
[ 709.416108] scsi_remove_host+0x79/0x120
[ 709.421114] srp_remove_work+0x90/0x1d0 [ib_srp]
[ 709.426867] process_one_work+0x20a/0x660
[ 709.431971] worker_thread+0x3d/0x3b0
[ 709.436655] kthread+0x13a/0x150
[ 709.440872] ? process_one_work+0x660/0x660
[ 709.446133] ? kthread_create_on_node+0x40/0x40
[ 709.451825] ret_from_fork+0x27/0x40
[ 709.456602] kworker/u66:6 D 0 341 2 0x80000000
[ 709.463371] Workqueue: events_unbound async_run_entry_fn
[ 709.469919] Call Trace:
[ 709.473325] __schedule+0x2fa/0xbb0
[ 709.477850] ? trace_hardirqs_on_caller+0xfb/0x190
[ 709.483847] schedule+0x36/0x90
[ 709.487982] schedule_timeout+0x22c/0x570
[ 709.493115] ? call_timer_fn+0x330/0x330
[ 709.498133] ? wait_for_completion_io_timeout+0xf7/0x180
[ 709.504732] io_schedule_timeout+0x1e/0x50
[ 709.509938] ? io_schedule_timeout+0x1e/0x50
[ 709.515413] wait_for_completion_io_timeout+0x11f/0x180
[ 709.521920] ? wake_up_q+0x80/0x80
[ 709.526376] blk_execute_rq+0x86/0xc0
[ 709.531135] scsi_execute+0xdb/0x1f0
[ 709.536336] sd_revalidate_disk+0xed/0x1c70 [sd_mod]
[ 709.543281] sd_probe_async+0xc3/0x1d0 [sd_mod]
[ 709.549524] async_run_entry_fn+0x38/0x160
[ 709.555503] process_one_work+0x20a/0x660
[ 709.561299] worker_thread+0x3d/0x3b0
[ 709.566688] kthread+0x13a/0x150
[ 709.571613] ? process_one_work+0x660/0x660
[ 709.577561] ? kthread_create_on_node+0x40/0x40
[ 709.583910] ret_from_fork+0x27/0x40
[ 709.589174] kworker/5:2 D 0 386 2 0x80000000
[ 709.596616] Workqueue: kaluad alua_rtpg_work [scsi_dh_alua]
[ 709.604130] Call Trace:
[ 709.608096] __schedule+0x2fa/0xbb0
[ 709.613265] ? trace_hardirqs_on_caller+0xfb/0x190
[ 709.619879] schedule+0x36/0x90
[ 709.624684] schedule_timeout+0x22c/0x570
[ 709.630437] ? call_timer_fn+0x330/0x330
[ 709.636059] ? wait_for_completion_io_timeout+0xf7/0x180
[ 709.643270] io_schedule_timeout+0x1e/0x50
[ 709.649083] ? io_schedule_timeout+0x1e/0x50
[ 709.655159] wait_for_completion_io_timeout+0x11f/0x180
[ 709.662284] ? wake_up_q+0x80/0x80
[ 709.667334] blk_execute_rq+0x86/0xc0
[ 709.672705] scsi_execute+0xdb/0x1f0
[ 709.677938] alua_rtpg_work+0x2b3/0xe8a [scsi_dh_alua]
[ 709.684983] process_one_work+0x20a/0x660
[ 709.690759] worker_thread+0x3d/0x3b0
[ 709.696107] kthread+0x13a/0x150
[ 709.700986] ? process_one_work+0x660/0x660
[ 709.706920] ? kthread_create_on_node+0x40/0x40
[ 709.713316] ret_from_fork+0x27/0x40
[ 709.718809] systemd-udevd D 0 1905 463 0x80000106
[ 709.726285] Call Trace:
[ 709.730309] __schedule+0x2fa/0xbb0
[ 709.735462] schedule+0x36/0x90
[ 709.740270] io_schedule+0x16/0x40
[ 709.745369] __lock_page+0x10a/0x140
[ 709.750691] ? page_cache_tree_insert+0x100/0x100
[ 709.757280] truncate_inode_pages_range+0x4ff/0x800
[ 709.764044] ? buffer_exit_cpu_dead+0x70/0x70
[ 709.770285] ? find_next_bit+0xb/0x10
[ 709.775691] ? cpumask_next+0x1a/0x20
[ 709.781089] ? smp_call_function_many+0x1e3/0x230
[ 709.787598] ? on_each_cpu_mask+0x52/0xa0
[ 709.793353] ? mark_buffer_async_write+0x20/0x20
[ 709.799741] ? buffer_exit_cpu_dead+0x70/0x70
[ 709.805888] ? on_each_cpu_cond+0x99/0xe0
[ 709.811636] truncate_inode_pages+0x15/0x20
[ 709.817536] kill_bdev+0x35/0x40
[ 709.822400] __blkdev_put+0x6d/0x1f0
[ 709.827625] blkdev_put+0x4e/0x130
[ 709.832673] blkdev_close+0x25/0x30
[ 709.837779] __fput+0xed/0x1f0
[ 709.842430] ____fput+0xe/0x10
[ 709.847030] task_work_run+0x8b/0xc0
[ 709.852251] do_exit+0x38d/0xc70
[ 709.857046] ? get_signal+0x2f6/0x8c0
[ 709.862347] do_group_exit+0x50/0xd0
[ 709.867515] get_signal+0x2ad/0x8c0
[ 709.872601] do_signal+0x28/0x680
[ 709.877452] ? blkdev_read_iter+0x35/0x40
[ 709.883108] ? __vfs_read+0xcc/0x130
[ 709.888237] ? vfs_read+0xa0/0x150
[ 709.893200] exit_to_usermode_loop+0x5a/0xa0
[ 709.899109] do_syscall_64+0x12e/0x170
[ 709.904475] entry_SYSCALL64_slow_path+0x25/0x25
[ 709.910797] RIP: 0033:0x7f5dc4e32d11
[ 709.915917] RSP: 002b:00007ffeb0b10588 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 709.925589] RAX: fffffffffffffffc RBX: 000056230da84490 RCX: 00007f5dc4e32d11
[ 709.934785] RDX: 0000000000000040 RSI: 000056230da844b8 RDI: 0000000000000007
[ 709.943987] RBP: 000056230dabc3c0 R08: 0000000000000000 R09: 000056230da84490
[ 709.953202] R10: 000000000000006f R11: 0000000000000246 R12: 000000008d530000
[ 709.962419] R13: 000000008d530040 R14: 000056230dabc410 R15: 0000000000000040