On Tue, 2017-04-18 at 16:56 -0700, James Bottomley wrote:
> diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
> index e5a2d590a104..31171204cfd1 100644
> --- a/drivers/scsi/scsi_lib.c
> +++ b/drivers/scsi/scsi_lib.c
> @@ -2611,7 +2611,6 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
> scsi_device_state state)
>               case SDEV_QUIESCE:
>               case SDEV_OFFLINE:
>               case SDEV_TRANSPORT_OFFLINE:
> -             case SDEV_BLOCK:
>                       break;
>               default:
>                       goto illegal;
> @@ -2625,6 +2624,7 @@ scsi_device_set_state(struct scsi_device *sdev, enum 
> scsi_device_state state)
>               case SDEV_OFFLINE:
>               case SDEV_TRANSPORT_OFFLINE:
>               case SDEV_CANCEL:
> +             case SDEV_BLOCK:
>               case SDEV_CREATED_BLOCK:
>                       break;
>               default:
> diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c
> index 82dfe07b1d47..e477f95bf169 100644
> --- a/drivers/scsi/scsi_sysfs.c
> +++ b/drivers/scsi/scsi_sysfs.c
> @@ -1282,8 +1282,17 @@ void __scsi_remove_device(struct scsi_device *sdev)
>               return;
>  
>       if (sdev->is_visible) {
> -             if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0)
> -                     return;
> +             /*
> +              * If blocked, we go straight to DEL so any commands
> +              * issued during the driver shutdown (like sync cache)
> +              * are errored
> +              */
> +             if (scsi_device_set_state(sdev, SDEV_CANCEL) != 0) {
> +                     if (scsi_device_set_state(sdev, SDEV_DEL) != 0)
> +                             return;
> +                     else
> +                             scsi_start_queue(sdev);
> +             }
>  
>               bsg_unregister_queue(sdev->request_queue);
>               device_unregister(&sdev->sdev_dev);

Hello James,

This approach cannot work. A scsi_target_block() call by the transport
layer can happen concurrently with the __scsi_remove_device() call and hence
can occur at any time between the scsi_start_queue() call by
__scsi_remove_device() and the sd_shutdown() call, resulting in a deadlock.
I have been able to trigger this with my tests by simulating a cable pull
shortly before running "rmmod ib_srp".

That deadlock did not occur with the patch series that makes synchronize
cache upon shutdown asynchronous. I'm going to resubmit that patch series.

Bart.

Reply via email to