On Wed, May 23, 2018 at 08:35:40AM -0600, Keith Busch wrote:
> On Wed, May 23, 2018 at 08:34:48AM +0800, Ming Lei wrote:
> > Let's consider the normal NVMe timeout code path:
> > 
> > 1) one request is timed out;
> > 
> > 2) controller is shutdown, this timed-out request is requeued from
> > nvme_cancel_request(), but can't dispatch because queues are quiesced
> > 
> > 3) reset is done from another context, and this request is dispatched
> > again, and completed exactly before returning EH_HANDLED to blk-mq, but
> > its state isn't updated to COMPLETE yet.
> > 
> > 4) then double completions are done from both normal completion and timeout
> > path.
> 
> We're definitely fixing this, but I must admit that's an impressive
> cognitive traversal across 5 thread contexts to arrive at that race. :)

It can be only 2 thread contexts if requeue is done on polled request
from nvme_timeout(), :-)

Thanks,
Ming

Reply via email to