On 12/6/18 9:18 PM, Mike Snitzer wrote:
> On Thu, Dec 06 2018 at 11:06pm -0500,
> Jens Axboe <[email protected]> wrote:
> 
>> On 12/6/18 8:54 PM, Mike Snitzer wrote:
>>> On Thu, Dec 06 2018 at  9:49pm -0500,
>>> Jens Axboe <[email protected]> wrote:
>>>
>>>> After the direct dispatch corruption fix, we permanently disallow direct
>>>> dispatch of non read/write requests. This works fine off the normal IO
>>>> path, as they will be retried like any other failed direct dispatch
>>>> request. But for the blk_insert_cloned_request() that only DM uses to
>>>> bypass the bottom level scheduler, we always first attempt direct
>>>> dispatch. For some types of requests, that's now a permanent failure,
>>>> and no amount of retrying will make that succeed.
>>>>
>>>> Use the driver private RQF_DONTPREP to track this condition in DM. If
>>>> we encounter a BUSY condition from blk_insert_cloned_request(), then
>>>> flag the request with RQF_DONTPREP. When we next time see this request,
>>>> ask blk_insert_cloned_request() to bypass insert the request directly.
>>>> This avoids the livelock of repeatedly trying to direct dispatch a
>>>> request, while still retaining the BUSY feedback loop for blk-mq so
>>>> that we don't over-dispatch to the lower level queue and mess up
>>>> opportunities for merging on the DM queue.
>>>>
>>>> Fixes: ffe81d45322c ("blk-mq: fix corruption with direct issue")
>>>> Reported-by: Bart Van Assche <[email protected]>
>>>> Cc: [email protected]
>>>> Signed-off-by: Jens Axboe <[email protected]>
>>>>
>>>> ---
>>>>
>>>> This passes my testing as well, like the previous patch. But unlike the
>>>> previous patch, we retain the BUSY feedback loop information for better
>>>> merging.
>>>
>>> But it is kind of gross to workaround the new behaviour to "permanently
>>> disallow direct dispatch of non read/write requests" by always failing
>>> such requests back to DM for later immediate direct dispatch.  That
>>> bouncing of the request was acceptable when there was load-based
>>> justification for having to retry (and in doing so: taking the cost of
>>> freeing the clone request gotten via get_request() from the underlying
>>> request_queues).
>>>
>>> Having to retry like this purely because the request isn't a read or
>>> write seems costly.. every non-read-write will have implied
>>> request_queue bouncing.  In multipath's case: it could select an
>>> entirely different underlying path the next time it is destaged (with
>>> RQF_DONTPREP set).  Which you'd think would negate all hope of IO
>>> merging based performance improvements -- but that is a tangent I'll
>>> need to ask Ming about (again).
>>>
>>> I really don't like this business of bouncing requests as a workaround
>>> for the recent implementation of the corruption fix.
>>>
>>> Why not just add an override flag to _really_ allow direct dispatch for
>>> _all_ types of requests?
>>>
>>> (just peeked at linux-block and it is looking like you took 
>>> jianchao.wang's series to avoid this hack... ;)
>>>
>>> Awesome.. my work is done for tonight!
>>
>> The whole point is doing something that is palatable to 4.20 and leaving
>> the more experimental stuff to 4.21, where we have some weeks to verify
>> that there are no conditions that cause IO stalls. I don't envision there
>> will be, but I'm not willing to risk it this late in the 4.20 cycle.
>>
>> That said, this isn't a quick and dirty and I don't think it's fair
>> calling this a hack. Using RQF_DONTPREP is quite common in drivers to
>> retain state over multiple ->queue_rq invocations. Using it to avoid
>> multiple direct dispatch failures (and obviously this new livelock)
>> seems fine to me.
> 
> But it bounces IO purely because non-read-write.  That results in
> guaranteed multiple blk_get_request() -- from underlying request_queues
> request-based DM is stacked on -- for every non-read-write IO that is
> cloned.  That seems pathological.  I must still be missing something.
> 
>> I really don't want to go around and audit every driver for potential
>> retained state over special commands, that's why the read+write thing is
>> in place. It's the safe option, which is what we need right now.
> 
> Maybe leave blk_mq_request_issue_directly() interface how it is,
> non-read-write restriction and all, but export a new
> __blk_mq_request_issue_directly() that _only_
> blk_insert_cloned_request() -- and future comparable users -- makes use
> of?
> 
> To me that is the best of both worlds: Fix corruption issue but don't
> impose needless blk_get_request() dances for non-read-write IO issued to
> dm-multipath.

Alright, I hear your point. Maybe we are going to be better off with
just dispatch insert. Here's a version of that on top of -git, basically
reverting the previous one, and applying the two hunks from Ming, but
also adding a missed one in blk_mq_try_issue_list_directly() where we
don't want to be adding the current request back to the list. That should
go to dispatch, too.

Note note note - not tested yet. Will do so now.


diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3262d83b9e07..89270a924ba8 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -1736,18 +1736,6 @@ static blk_status_t __blk_mq_issue_directly(struct 
blk_mq_hw_ctx *hctx,
        return ret;
 }
 
-/*
- * Don't allow direct dispatch of anything but regular reads/writes,
- * as some of the other commands can potentially share request space
- * with data we need for the IO scheduler. If we attempt a direct dispatch
- * on those and fail, we can't safely add it to the scheduler afterwards
- * without potentially overwriting data that the driver has already written.
- */
-static bool blk_rq_can_direct_dispatch(struct request *rq)
-{
-       return req_op(rq) == REQ_OP_READ || req_op(rq) == REQ_OP_WRITE;
-}
-
 static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx,
                                                struct request *rq,
                                                blk_qc_t *cookie,
@@ -1769,7 +1757,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct 
blk_mq_hw_ctx *hctx,
                goto insert;
        }
 
-       if (!blk_rq_can_direct_dispatch(rq) || (q->elevator && !bypass_insert))
+       if (q->elevator && !bypass_insert)
                goto insert;
 
        if (!blk_mq_get_dispatch_budget(hctx))
@@ -1785,7 +1773,7 @@ static blk_status_t __blk_mq_try_issue_directly(struct 
blk_mq_hw_ctx *hctx,
        if (bypass_insert)
                return BLK_STS_RESOURCE;
 
-       blk_mq_sched_insert_request(rq, false, run_queue, false);
+       blk_mq_request_bypass_insert(rq, run_queue);
        return BLK_STS_OK;
 }
 
@@ -1801,7 +1789,7 @@ static void blk_mq_try_issue_directly(struct 
blk_mq_hw_ctx *hctx,
 
        ret = __blk_mq_try_issue_directly(hctx, rq, cookie, false);
        if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE)
-               blk_mq_sched_insert_request(rq, false, true, false);
+               blk_mq_request_bypass_insert(rq, true);
        else if (ret != BLK_STS_OK)
                blk_mq_end_request(rq, ret);
 
@@ -1831,15 +1819,12 @@ void blk_mq_try_issue_list_directly(struct 
blk_mq_hw_ctx *hctx,
                struct request *rq = list_first_entry(list, struct request,
                                queuelist);
 
-               if (!blk_rq_can_direct_dispatch(rq))
-                       break;
-
                list_del_init(&rq->queuelist);
                ret = blk_mq_request_issue_directly(rq);
                if (ret != BLK_STS_OK) {
                        if (ret == BLK_STS_RESOURCE ||
                                        ret == BLK_STS_DEV_RESOURCE) {
-                               list_add(&rq->queuelist, list);
+                               blk_mq_request_bypass_insert(rq, true);
                                break;
                        }
                        blk_mq_end_request(rq, ret);

-- 
Jens Axboe

Reply via email to