On Fri, Dec 01, 2017 at 07:52:14PM +0000, Bart Van Assche wrote:
> On Fri, 2017-12-01 at 10:58 +0800, Ming Lei wrote:
> > On Thu, Nov 30, 2017 at 04:08:45PM -0800, Bart Van Assche wrote:
> > > blk_mq_sched_mark_restart_hctx() must be called before
> >
> > Could you please describe the theory on commit log? Like, why is it
> > a must? and what is the issue to be fixed?
>
> The BLK_MQ_S_SCHED_RESTART test at the end of blk_mq_dispatch_rq_list() can
> only work if BLK_MQ_S_SCHED_RESTART is set before blk_mq_dispatch_rq_list()
> is called.
The theory about using BLK_MQ_S_SCHED_RESTART in current way is that we
mark it after requests are added to hctx->dispatch, then blk_mq_sched_restart()
can see this request to be revisited.
So in theory, we don't need to set it before each dispatch.
Once .get_budget()/.put_budget() is introduced, things may be a bit
different because we may need to revisit requests in scheduler/SW queue.
But we depend on SCSI's RESTART(scsi_end_request()) to do that. So we
still don't need this patch.
> BTW, without this patch every iteration of my test triggers a
> queue stall. With this patch a queue stall only occurs sporadically so I
> think we really need something like this patch.
We need to root cause your queue stall first, otherwise any change can
be thought as workaround. Could you investigate the issue a bit and get
the exact reason?
>
> > > blk_mq_dispatch_rq_list() is called. Make sure that
> > > BLK_MQ_S_SCHED_RESTART is set before any blk_mq_dispatch_rq_list()
> > > call occurs.
> > >
> > > Fixes: commit b347689ffbca ("blk-mq-sched: improve dispatching from sw
> > > queue")
> >
> > We always mark RESTART state bit just before dispatching from
> > ->dispatch_list,
> > this way has been there before b347689ffbca, which doesn't change this
> > RESTART mechanism, so please explain a bit why it is a fix on commit
> > b347689ffbca.
>
> I'm not completely sure which patch introduced the lockup fixed by this patch
> but I will have another look whether this was really introduced by commit
> b347689ffbca.
Please make sure 'Fixes' tag correct.
--
Ming