On Mon, Jan 29, 2018 at 09:55:41PM +0200, Sagi Grimberg wrote: > > Thanks for the fix. It looks like we still have a problem, though. > > Commands submitted with the "shutdown_lock" held need to be able to make > > forward progress without relying on a completion, but this one could > > block indefinitely. > > Can you explain to me why is the shutdown_lock needed to synchronize > nvme_dev_disable? More concretely, how is nvme_dev_disable different > from other places where we rely on the ctrl state to serialize stuff? > > The only reason I see would be to protect against completion-after-abort > scenario but I think the block layer should protect against it (checks > if the request timeout timer fired).
We can probably find a way to use the state machine for this. Disabling the controller pre-dates the state machine, and the mutex is there to protect against two actors shutting the controller down at the same time, like a hot removal at the same time as a timeout handling reset.