On Wed, 2017-11-29 at 11:05 +0800, Jason Yan wrote:
> In commit fbce4d97fd43 ("scsi: fixup kernel warning during rmmod()"),
> we
> removed scsi_device_get() and directly called get_device() to
> increase
> the refcount of the device. But actullay scsi_device_get() will fail
> in
> three cases:
> 1. the scsi device is in SDEV_DEL or SDEV_CANCEL state
> 2. get_device() fail
> 3. the module is not alive
> 
> The intended purpose was to remove the check of the module alive.
> Unfortunately the check of the device state was droped too. And this
> introduced a race condition like this:
> 
>       CPU0                                           CPU1
> __scsi_remove_target()
>   ->iterate shost->__devices
>   ->scsi_remove_device()
>   ->put_device()
>       someone still hold a refcount
>                                                    sd_release()
>                                                       -
> >scsi_disk_put()
>                                                       ->put_device()
> last put and trigger the device release
> 
>   ->goto restart
>   ->iterate shost->__devices and got the same device
>   ->get_device() while refcount is 0

This analysis fails here: get_device() on something with refcount 0
returns NULL.  That triggers the if clause to ignore this device.

We may have a more complex way of triggering a dual put race as the
trace implies, but I don't think this is it.

[...]
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 50e7d7e..d398894 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1398,6 +1398,15 @@ void scsi_remove_device(struct scsi_device
> *sdev)
>  }
>  EXPORT_SYMBOL(scsi_remove_device);
>  
> +static int scsi_device_get_not_deleted(struct scsi_device *sdev)
> +{
> +     if (sdev->sdev_state == SDEV_DEL || sdev->sdev_state ==
> SDEV_CANCEL)
> +             return -ENXIO;
> +     if (!get_device(&sdev->sdev_gendev))
> +             return -ENXIO;
> +     return 0;
> +}

This is pretty much scsi_device_get() without the try_module get, so
they should probably be combined.

James

>  static void __scsi_remove_target(struct scsi_target *starget)
>  {
>       struct Scsi_Host *shost = dev_to_shost(starget->dev.parent);
> @@ -1415,7 +1424,7 @@ static void __scsi_remove_target(struct
> scsi_target *starget)
>                */
>               if (sdev->channel != starget->channel ||
>                   sdev->id != starget->id ||
> -                 !get_device(&sdev->sdev_gendev))
> +                 scsi_device_get_not_deleted(sdev))
>                       continue;
>               spin_unlock_irqrestore(shost->host_lock, flags);
>               scsi_remove_device(sdev);

Reply via email to