On Tue, 2017-11-07 at 22:42 +0000, Bart Van Assche wrote:
> On Tue, 2017-11-07 at 10:09 -0800, James Bottomley wrote:
> > 
> > but can you investigate the root cause rather than trying this
> > bandaid?
> 
> Hello James,
> 
> Thanks for your reply. I think that the root cause is that SCSI
> scanning activity can continue to submit I/O even after
> scsi_remove_host() has unlocked scan_mutex but that
> scsi_remove_host() removes some of the infrastructure that is
> essential to process SCSI requests.

That's not really a useful answer: how does it submit I/O after the
device goes into DEL?  In theory every I/O submitted after this is
returned with an immediate error.  I could buy the fact that we have
pending I/O submitted before we go into DEL, which would argue for some
sort of quiesce wait, but I don't see how I/O submitted after DEL
causes a hang.

>  Are you OK with
> e.g. moving a significant part of scsi_remove_host() into
> scsi_host_dev_release()?

Well not really without seeing the root cause.  Before scsi_forget_host
()it's all about state and after it's just removing some user visible
host attributes, so I can't see how either matters much.
 scsi_forget_host() must be executed from scsi_remove_host() because
that's how the devices go into the DEL state and how we error the
requests without troubling the device driver, so that can't be moved to
release

James

Reply via email to