On Sat, 2018-08-04 at 18:01 +0800, Ming Lei wrote:
> On Sat, Aug 4, 2018 at 8:03 AM, Bart Van Assche <[email protected]> 
> wrote:
> > 
> > diff --git a/block/blk-pm.c b/block/blk-pm.c
> > index 2a4632d0be4b..0708185ead61 100644
> > --- a/block/blk-pm.c
> > +++ b/block/blk-pm.c
> > @@ -107,17 +107,31 @@ int blk_pre_runtime_suspend(struct request_queue *q)
> >         if (!q->dev)
> >                 return ret;
> > 
> > +       WARN_ON_ONCE(q->rpm_status != RPM_ACTIVE);
> > +
> >         blk_pm_runtime_lock(q);
> > +       /*
> > +        * Switch to preempt-only mode before calling synchronize_rcu() such
> > +        * that later blk_queue_enter() calls see the preempt-only state. 
> > See
> > +        * also http://lwn.net/Articles/573497/.
> > +        */
> > +       blk_set_pm_only(q);
> > +       ret = -EBUSY;
> > +       if (percpu_ref_read(&q->q_usage_counter) == 1) {
> > +               synchronize_rcu();
> > +               if (percpu_ref_read(&q->q_usage_counter) == 1)
> > +                       ret = 0;
> > +       }
> 
> The above code looks too tricky, and the following issue might exist in case
> of potential WRITEs re-order from blk_queue_enter().
> 
> 1) suppose there are one in-flight request not completed yet
> 
> 2) observer is the percpu_ref_read() following synchronize_rcu().
> 
> 3) inside blk_queue_enter(), WRITE in 
> percpu_ref_tryget_live()/percpu_ref_put()
> can be re-ordered from view of the observer in 2)
> 
> 4) then percpu_ref_read() may return 1 even though the only in-flight
> request isn't completed
> 
> 5) suspend still moves on, and data loss may be triggered
> 
> Cc Paul, and Alan have been CCed already, so we have several memory
> model experts here, :-), hope they may help to clarify if this reorder case
> may exist.

Hello Ming,

There are two cases:
(a) The percpu_ref_tryget_live() call in (3) is triggered by a runtime power
    state transition (RPM_*).
(b) The percpu_ref_tryget_live() call in (3) is not triggered by a runtime
    power state transition.

In case (b) the driver that submits the request should protect the request
with scsi_autopm_get_device() / scsi_autopm_put_device() or by any other
mechanism that prevents runtime power state transitions. Since
scsi_autopm_get_device() is called before blk_queue_enter(), since
scsi_autopm_get_device() only returns after it has resumed a device and since
scsi_autopm_get_device() prevents runtime suspend, the function
blk_pre_runtime_suspend() won't be called concurrently with blk_queue_enter()
in case (b).

Since the power management serializes power state transitions also for case
(a) the race described above won't happen. blk_pre_runtime_suspend() will have
finished before any RQF_PM requests are submitted.

Bart.


Reply via email to