On Tue, 2017-12-05 at 06:42 +0800, Ming Lei wrote:
> On Mon, Dec 04, 2017 at 09:30:32AM -0800, Bart Van Assche wrote:
> > * A systematic lockup for SCSI queues with queue depth 1. The
> > following test reproduces that bug systematically:
> > - Change the SRP initiator such that SCSI target queue depth is
> > limited to 1.
> > - Run the following command:
> > srp-test/run_tests -f xfs -d -e none -r 60 -t 01
> > See also "[PATCH 4/7] blk-mq: Avoid that request processing
> > stalls when sharing tags"
> > (https://marc.info/?l=linux-block&m=151208695316857). Note:
> > reverting commit 0df21c86bdbf also fixes a sporadic SCSI request
> > queue lockup while inserting a blk_mq_sched_mark_restart_hctx()
> > before all blk_mq_dispatch_rq_list() calls only fixes the
> > systematic lockup for queue depth 1.
>
> You are the only reproducer [ ... ]
That's not correct. I'm pretty sure if you try to reproduce this that
you will see the same hang I ran into. Does this mean that you have not
yet tried to reproduce the hang I reported?
> You said that your patch fixes 'commit b347689ffbca ("blk-mq-sched:
> improve dispatching from sw queue")', but you don't mention any issue
> about that commit.
That's not correct either. From the commit message "A systematic lockup
for SCSI queues with queue depth 1."
> > I think the above means that it is too risky to try to fix all bugs
> > introduced by commit 0df21c86bdbf before kernel v4.15 is released.
> > Hence revert that commit.
>
> What is the risk?
That more bugs were introduced by commit 0df21c86bdbf than the ones that
have been discovered so far.
Bart.