Sorry for bothering you again. ;)
There is a dangerous scenario which caused by nvme_wait_freeze in
please consider it.
if the controller no response, we have to rely on the timeout path.
there are issues below:
nvme_dev_disable need to be invoked.
nvme_dev_disable will quiesce queues, cancel and requeue and outstanding
nvme_reset_work will hang at nvme_wait_freeze
if we set NVME_REQ_CANCELLED and return BLK_EH_HANDLED as the RESETTING case,
nvme_reset_work will hang forever, because no one could complete the entered
if we invoke nvme_reset_ctrl after modify the state machine to be able to
change to RESETTING
to RECONNECTING and queue reset_work, we still cannot move things forward,
because the reset_work
is being executed.
if we use nvme_wait_freeze_timeout in nvme_reset_work, unfreeze and return if
expires. But the
timeout value is tricky..
Maybe we could use blk_set_preempt_only which is also gate on blk_queue_enter.
We needn't to drain the queue for it. It is lightweight.
And nvme needn't worry about the full queue prevent admin requests from being
Looking forward your precious advice.
On 02/08/2018 09:40 AM, jianchao.wang wrote:
> Hi Keith
> Really thanks for your your precious time and kindly directive.
> That's really appreciated. :)
> On 02/08/2018 12:13 AM, Keith Busch wrote:
>> On Wed, Feb 07, 2018 at 10:13:51AM +0800, jianchao.wang wrote:
>>> What's the difference ? Can you please point out.
>>> I have shared my understanding below.
>>> But actually, I don't get the point what's the difference you said.
>> It sounds like you have all the pieces. Just keep this in mind: we don't
>> want to fail IO if we can prevent it.
> Yes, absolutely.
>> A request is allocated from an hctx pool of tags. Once the request is
>> allocated, it is permently tied to that hctx because that's where its
>> tag came from. If that hctx becomes invalid, the request has to be ended
>> with an error, and we can't do anything about that[*].
>> Prior to a reset, we currently halt new requests from being allocated by
>> freezing the request queues. We unfreeze the queues after the new state
>> of the hctx's is established. This way all IO requests that were gating
>> on the unfreeze are guaranteed to enter into a valid context.
>> You are proposing to skip freeze on a reset. New requests will then be
>> allocated before we've established the hctx map. Any request allocated
>> will have to be terminated in failure if the hctx is no longer valid
>> once the reset completes.
> Yes, if any previous hctx doesn't come back, the requests on that hctx
> will be drained with BLK_STS_IOERR.
> -> blk_mq_freeze_queue
> -> blk_freeze_queue
> -> blk_mq_freeze_queue_wait
> But the nvmeq's cq_vector is -1.
>> Yes, it's entirely possible today a request allocated prior to the reset
>> may need to be terminated after the reset. There's nothing we can do
>> about those except end them in failure, but we can prevent new ones from
>> sharing the same fate. You are removing that prevention, and that's what
>> I am complaining about.
> Thanks again for your precious time to detail this.
> So I got what you concern about is that this patch doesn't freeze the queue
> for reset case
> any more. And there maybe new requests enter, which will be failed when the
> hctx doesn't come back during reset procedure. And this should be avoided.
> I will change this in next V3 version.
>> * Future consideration: we recently obtained a way to "steal" bios that
>> looks like it may be used to back out certain types of requests and let
>> the bio create a new one.
> Yeah, that will be a great idea to reduce the loss when hctx is gone.
> Linux-nvme mailing list