Hello,
On Thu, Jul 26, 2018 at 02:09:41PM +0000, Bart Van Assche wrote:
> On Thu, 2018-07-26 at 06:35 -0700, Tejun Heo wrote:
> > Making removal asynchronous this way sometimes causes issues because
> > whether the user sees the device released or not is racy.
> > kernfs/sysfs have mechanisms to deal with these cases - remove_self
> > and kernfs_break_active_protection(). Have you looked at those?
>
> Hello Tejun,
>
> The call stack in the patch description shows that sdev_store_delete() is
> involved in the deadlock. The implementation of that function is as follows:
>
> static ssize_t
> sdev_store_delete(struct device *dev, struct device_attribute *attr,
> const char *buf, size_t count)
> {
> if (device_remove_file_self(dev, attr))
> scsi_remove_device(to_scsi_device(dev));
> return count;
> };
>
> device_remove_file_self() calls sysfs_remove_file_self() and that last
> function calls kernfs_remove_self(). In other words, kernfs_remove_self()
> is already being used. Please let me know if I misunderstood your comment.
So, here, because scsi_remove_device() is the one involved in the
circular dependency, just breaking the dependency chain on the file
itself (self removal) isn't enough. You can wrap the whole operation
with kernfs_break_active_protection() to also move
scsi_remove_device() invocation outside the kernfs synchronization.
This will need to be piped through sysfs but shouldn't be too complex.
Thanks.
--
tejun