On Fri, 2017-05-12 at 15:15 +0200, Martin Wilck wrote:
> alua_rtpg() can race with alua_bus_detach(). The assertion that
> alua_dh_data *h->sdev must be non-NULL is not guaranteed because
> alua_bus_detach sets this field to NULL before removing the entry
> from the port group's dh_list.
> 
> This happens when a device is about to be removed, so don't BUG out
> but continue silently.
> 
> Signed-off-by: Martin Wilck <[email protected]>
> Reviewed-by: Hannes Reinecke <[email protected]>
> ---
>  drivers/scsi/device_handler/scsi_dh_alua.c | 20 ++++++++++++++------
>  1 file changed, 14 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/scsi/device_handler/scsi_dh_alua.c 
> b/drivers/scsi/device_handler/scsi_dh_alua.c
> index 2b60f493f90e..a59783020c66 100644
> --- a/drivers/scsi/device_handler/scsi_dh_alua.c
> +++ b/drivers/scsi/device_handler/scsi_dh_alua.c
> @@ -652,9 +652,15 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
> alua_port_group *pg)
>                                       rcu_read_lock();
>                                       list_for_each_entry_rcu(h,
>                                               &tmp_pg->dh_list, node) {
> -                                             /* h->sdev should always be 
> valid */
> -                                             BUG_ON(!h->sdev);
> -                                             h->sdev->access_state = desc[0];
> +                                             /*
> +                                              * We might be racing with
> +                                              * alua_bus_detach here
> +                                              */
> +                                             struct scsi_device *sdev =
> +                                                     h->sdev;
> +                                             if (sdev)
> +                                                     sdev->access_state =
> +                                                             desc[0];
>                                       }
>                                       rcu_read_unlock();
>                               }
> @@ -694,11 +700,13 @@ static int alua_rtpg(struct scsi_device *sdev, struct 
> alua_port_group *pg)
>                       pg->expiry = 0;
>                       rcu_read_lock();
>                       list_for_each_entry_rcu(h, &pg->dh_list, node) {
> -                             BUG_ON(!h->sdev);
> -                             h->sdev->access_state =
> +                             struct scsi_device *sdev = h->sdev;
> +                             if (!sdev)
> +                                     continue;
> +                             sdev->access_state =
>                                       (pg->state & SCSI_ACCESS_STATE_MASK);
>                               if (pg->pref)
> -                                     h->sdev->access_state |=
> +                                     sdev->access_state |=
>                                               SCSI_ACCESS_STATE_PREFERRED;
>                       }
>                       rcu_read_unlock();

Hello Martin,

Allowing races like the one this patch tries to address to exist makes
the ALUA code harder to maintain than necessary. Have you considered to
make alua_bus_detach() wait until ALUA work has finished by using e.g.
cancel_work_sync() or rcu_synchronize()?

Bart.

Reply via email to