On Mon, Nov 06, 2017 at 07:45:23PM +0000, Bart Van Assche wrote:
> On Sat, 2017-11-04 at 08:19 -0600, Jens Axboe wrote:
> > On 11/03/2017 07:55 PM, Ming Lei wrote:
> > > It is very expensive to atomic_inc/atomic_dec the host wide counter of
> > > host->busy_count, and it should have been avoided via blk-mq's mechanism
> > > of getting driver tag, which uses the more efficient way of sbitmap queue.
> > >
> > > Also we don't check atomic_read(&sdev->device_busy) in
> > > scsi_mq_get_budget()
> > > and don't run queue if the counter becomes zero, so IO hang may be caused
> > > if all requests are completed just before the current SCSI device
> > > is added to shost->starved_list.
> >
> > This looks like an improvement. I have added it for 4.15.
> >
> > Bart, does this fix your hang?
>
> No, it doesn't. After I had reduced starget->can_queue in the SRP initiator I
> ran into the following hang while running the srp-test software:
>
> sysrq: SysRq : Show Blocked State
> task PC stack pid father
> systemd-udevd D 0 19882 467 0x80000106
> Call Trace:
> __schedule+0x2fa/0xbb0
> schedule+0x36/0x90
> io_schedule+0x16/0x40
> __lock_page+0x10a/0x140
> truncate_inode_pages_range+0x4ff/0x800
> truncate_inode_pages+0x15/0x20
> kill_bdev+0x35/0x40
> __blkdev_put+0x6d/0x1f0
> blkdev_put+0x4e/0x130
> blkdev_close+0x25/0x30
> __fput+0xed/0x1f0
> ____fput+0xe/0x10
> task_work_run+0x8b/0xc0
> do_exit+0x38d/0xc70
> do_group_exit+0x50/0xd0
> get_signal+0x2ad/0x8c0
> do_signal+0x28/0x680
> exit_to_usermode_loop+0x5a/0xa0
> do_syscall_64+0x12e/0x170
> entry_SYSCALL64_slow_path+0x25/0x25
I can't reproduce your issue on IB/SRP any more against V4.14-RC4 with
the following patches, and without any hang after running your 6
srp-test:
88022d7201e9 blk-mq: don't handle failure in .get_budget
826a70a08b12 SCSI: don't get target/host busy_count in scsi_mq_get_budget()
1f460b63d4b3 blk-mq: don't restart queue when .get_budget returns
BLK_STS_RESOURCE
358a3a6bccb7 blk-mq: don't handle TAG_SHARED in restart
0df21c86bdbf scsi: implement .get_budget and .put_budget for blk-mq
aeec77629a4a scsi: allow passing in null rq to scsi_prep_state_check()
b347689ffbca blk-mq-sched: improve dispatching from sw queue
de1482974080 blk-mq: introduce .get_budget and .put_budget in blk_mq_ops
63ba8e31c3ac block: kyber: check if there are requests in ctx in
kyber_has_work()
7930d0a00ff5 sbitmap: introduce __sbitmap_for_each_set()
caf8eb0d604a blk-mq-sched: move actual dispatching into one helper
5e3d02bbafad blk-mq-sched: dispatch from scheduler IFF progress is made in
->dispatch
If you can reproduce, please provide me at least the following log
first:
find /sys/kernel/debug/block -name tags | xargs cat | grep busy
If any pending requests arn't completed, please provide the related
info in dbgfs about where is the request.
--
Ming