On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote:
> In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"),
> we
> removed scsi_device_get() and directly called get_device() to
> increase
> the refcount of the device. But actullay scsi_device_get() will fail
> in
> three cases:
> 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state
> 2. get_device() fail
> 3. the module is not alive
>
> The intended purpose was to remove the check of the module alive.
> Unfortunately the check of the device state was droped too. And this
> introduced a race condition like this:
>
> CPU0 CPU1
> __scsi_remove_target()
> ->iterate shost->__devices
> ->scsi_remove_device()
> ->put_device()
> someone still hold a refcount
> sd_release()
> -
> >scsi_disk_put()
> ->put_device()
> last put and trigger the device release
>
> ->goto restart
> ->iterate shost->__devices and got the same device
> ->get_device() while refcount is 0
This analysis fails here: get_device() on something with refcount 0
returns NULL. That triggers the if clause to ignore this device.
We may have a more complex way of triggering a dual put race as the
trace implies, but I don't think this is it.
[...]
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 50e7d7e..d398894 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device
> *sdev)
> }
> EXPORT_SYMBOL(scsi_remove_device);
>
> +static int scsi_device_get_not_deleted(struct scsi_device *sdev)
> +{
> + if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state ==
> SDEV_CANCEL)
> + return -ENXIO;
> + if (!get_device(&sdev->sdev_gendev))
> + return -ENXIO;
> + return 0;
> +}
This is pretty much scsi_device_get() without the try_module get, so
they should probably be combined.
James
> static void __scsi_remove_target(struct scsi_target *starget)
> {
> struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct
> scsi_target *starget)
> */
> if (sdev->channel != starget->channel ||
> sdev->id != starget->id ||
> - !get_device(&sdev->sdev_gendev))
> + scsi_device_get_not_deleted(sdev))
> continue;
> spin_unlock_irqrestore(shost->host_lock, flags);
> scsi_remove_device(sdev);