Re: [PATCH] scsi: fix race condition when removing target

Ewan D. Milne Wed, 29 Nov 2017 11:20:41 -0800

On Wed, 2017-11-29 at 19:11 +0000, Bart Van Assche wrote:
> On Wed, 2017-11-29 at 13:49 -0500, Ewan D. Milne wrote:
> > because a get inside a destructor would *always* be wrong, no?
> 
> Hello Ewan,
> 
> That's not what we are discussing. What can happen with the SCSI core is that
> get_device() is called concurrently with the destructor. get_device() can be
> called concurrently with the destructor because the destructore removes a
> device from the siblings list and because the SCSI core can call get_device()
> for devices it finds on the siblings list. Personally I think that design is
> superior compared to removing a SCSI device from the sibling list before the
> last put_device() call because the approach followed in the SCSI core leads to
> a simpler implementation. However, it seems like the current get_device()
> implementation does not yet support the SCSI core design ...
> 
> Bart.


OK, well, I think the point still stands, though, once the refcount
goes to zero and the destructor is invoked, a get that then increments
the refcount seems fundamentally wrong to me.  Especially if a
subsequent put causes the destructor to be invoked *simultaneously*
*on another thread*.  The locking has to happen somewhere, why isn't
this done by the kobject?

Relying on the client code to get this right means that there are
opportunities all over the kernel for problems like this to happen,
just like here, where we inadvertently removed the state check that
prevented the get_device() call.

-Ewan

Re: [PATCH] scsi: fix race condition when removing target

Reply via email to