On Thu, May 17, 2018 at 06:21:40PM +0000, Bart Van Assche wrote:
> On Thu, 2017-10-19 at 16:21 -0400, Josef Bacik wrote:
> > +   blk_mq_start_request(req);
> >     if (unlikely(nsock->pending && nsock->pending != req)) {
> >             blk_mq_requeue_request(req, true);
> >             ret = 0;
> 
> (replying to an e-mail from seven months ago)
> 
> Hello Josef,
> 
> Are you aware that the nbd driver is one of the very few block drivers that
> calls blk_mq_requeue_request() after a request has been started? I think that
> can lead to the block layer core to undesired behavior, e.g. that the timeout
> handler fires concurrently with a request being reinstered. Can you or a
> colleague have a look at this? I would like to add the following code to the
> block layer core and I think that the nbd driver would trigger this warning:
> 
>  void blk_mq_requeue_request(struct request *rq, bool kick_requeue_list)
>  {
> +       WARN_ON_ONCE(old_state != MQ_RQ_COMPLETE);
> +
>         __blk_mq_requeue_request(rq);
> 

Yup I can tell you why, on 4.11 where I originally did this work
__blk_mq_requeue_request() did this

static void __blk_mq_requeue_request(struct request *rq)
{
        struct request_queue *q = rq->q;

        trace_block_rq_requeue(q, rq);
        rq_qos_requeue(q, &rq->issue_stat);
        blk_mq_sched_requeue_request(rq);

        if (test_and_clear_bit(REQ_ATOM_STARTED, &rq->atomic_flags)) {
                if (q->dma_drain_size && blk_rq_bytes(rq))
                        rq->nr_phys_segments--;
        }
}

So it was clearing the started part when it did the requeue.  If that's not what
I'm supposed to be doing anymore then I can send a patch to fix it.  What is
supposed to be done if I did already do blk_mq_start_request, because I can
avoid doing the start until after that chunk of code, but there's a part further
down that needs to have start done before we reach it, so I'll have to do
whatever the special thing is now there.  Thanks,

Josef

Reply via email to