Re: [PATCH 3/6] nvme: Move all IO out of controller reset

Ming Lei Mon, 21 May 2018 08:35:15 -0700

On Mon, May 21, 2018 at 09:03:31AM -0600, Keith Busch wrote:
> On Mon, May 21, 2018 at 10:58:51PM +0800, Ming Lei wrote:
> > On Mon, May 21, 2018 at 08:22:19AM -0600, Keith Busch wrote:
> > > On Sat, May 19, 2018 at 07:03:58AM +0800, Ming Lei wrote:
> > > > On Fri, May 18, 2018 at 10:38:20AM -0600, Keith Busch wrote:
> > > > > +
> > > > > +     if (unfreeze)
> > > > > +             nvme_wait_freeze(&dev->ctrl);
> > > > > +
> > > > 
> > > > timeout may comes just before&during blk_mq_update_nr_hw_queues() or
> > > > the above nvme_wait_freeze(), then both two may hang forever.
> > > 
> > > Why would it hang forever? The scan_work doesn't stop a timeout from
> > > triggering a reset to reclaim requests necessary to complete a freeze.
> > 
> > nvme_dev_disable() will quiesce queues, then nvme_wait_freeze() or
> > blk_mq_update_nr_hw_queues() may hang forever.
> 
> nvme_dev_disable is just the first part of the timeout sequence. You
> have to follow it through to the reset_work that either restarts or
> kills the queues.


nvme_dev_disable() quiesces queues first before killing queues.

If queues are quiesced during or before nvme_wait_freeze() is run
from the 2nd part of reset, the 2nd part can't move on, and IO hang
is caused. Finally no reset can be scheduled at all.

Thanks,
Ming

Re: [PATCH 3/6] nvme: Move all IO out of controller reset

Reply via email to