On Wednesday, October 27, 2010, Rafael J. Wysocki wrote:
> On Tuesday, October 26, 2010, Mathieu Desnoyers wrote:
> > * Alan Stern ([email protected]) wrote:
> > > On Tue, 26 Oct 2010, Mathieu Desnoyers wrote:
> > > 
> > > > * Peter Zijlstra ([email protected]) wrote:
> > > > > On Tue, 2010-10-26 at 11:56 -0500, Pierre Tardy wrote:
> > > > > > 
> > > > > > +       trace_runtime_pm_usage(dev, 
> > > > > > atomic_read(&dev->power.usage_count)+1);
> > > > > >         atomic_inc(&dev->power.usage_count); 
> > > > > 
> > > > > That's terribly racy..
> > > > 
> > > > Looking at the original code, it looks racy even without considering the
> > > > tracepoint:
> > > > 
> > > > int __pm_runtime_get(struct device *dev, bool sync)
> > > >  {
> > > >         int retval;
> > > > 
> > > > +       trace_runtime_pm_usage(dev, 
> > > > atomic_read(&dev->power.usage_count)+1);
> > > >         atomic_inc(&dev->power.usage_count);
> > > >         retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
> > > > 
> > > > There is no implied memory barrier after "atomic_inc". So either all 
> > > > these
> > > > inc/dec are protected with mutexes or spinlocks, in which case one 
> > > > might wonder
> > > > why atomic operations are used at all, or it's a racy mess. (I vote for 
> > > > the
> > > > second option)
> > > 
> > > I don't understand.  What's the problem?  The inc/dec are atomic 
> > > because they are not protected by spinlocks, but everything else is 
> > > (aside from the tracepoint, which is new).
> > > 
> > > > kref should certainly be used there.
> > > 
> > > What for?
> > 
> > kref has the following "get":
> > 
> >         atomic_inc(&kref->refcount);
> >         smp_mb__after_atomic_inc();
> > 
> > What seems to be missing in __pm_runtime_get() and 
> > pm_runtime_get_noresume() is
> > the memory barrier after the atomic increment. The atomic increment is free 
> > to
> > be reordered into the following spinlock (within pm_request_resume or 
> > pm_request
> > resume execution) because taking a spinlock only acts as a memory barrier 
> > with
> > acquire semantic, not a full memory barrier.
> >
> > So AFAIU, the failure scenario would be as follows (sorry for the 80+ 
> > columns):
> > 
> > initial conditions: usage_count = 1
> > 
> > CPU A                                                       CPU B
> > 1) __pm_runtime_get() (sync = true)
> > 2)   atomic_inc(&usage_count) (not committed to memory yet)
> > 3)   pm_runtime_resume()
> > 4)     spin_lock_irqsave(&dev->power.lock, flags);
> > 5)     retval = __pm_request_resume(dev);
> 
> If sync = true this is
>            retval = __pm_runtime_resume(dev);
> which drops and reacquires the spinlock.  In the meantime it sets
> ->power.runtime_status so that __pm_runtime_idle() will fail if run at this
> point.
> 
> > 6)     (execute the body of __pm_request_resume and return)
> > 7)                                                          
> > __pm_runtime_put() (sync = true) 
> > 8)                                                          if 
> > (atomic_dec_and_test(&dev->power.usage_count))
> >                                                               (still see 
> > usage_count == 1 before decrement,
> >                                                                thus 
> > decrement to 0)
> > 9)                                                             
> > pm_runtime_idle()
> > 10)  spin_unlock_irqrestore(&dev->power.lock, flags)
> > 11)                                                            
> > spin_lock_irq(&dev->power.lock);
> > 12)                                                            retval = 
> > __pm_runtime_idle(dev);
> 
> Moreover, __pm_runtime_idle() checks ->power.usage_count under the spinlock,
> so it will see it's been incremented in the meantime and it will back off.
> 
> > 13)                                                            
> > spin_unlock_irq(&dev->power.lock);
> > 
> > So we end up in a situation where CPU A expects the device to be resumed, 
> > but
> > the last action performed has been to bring it to idle.
> >
> > A smp_mb__after_atomic_inc() between lines 2 and 3 would fix this.
> 
> I don't think this particular race is possible.  However, there is another one
> that seems to be possible (in a different function) that an explicit barrier
> will prevent from happening.
> 
> It's related to pm_runtime_get_noresume(), but I think it's better to put the
> barrier where it's necessary rather than into pm_runtime_get_noresume() 
> itself.

Actually, no.  Since rpm_idle() and rpm_suspend() both check usage_count under
the spinlock, the race I was thinking about doesn't appear to be possible after
all.

Thanks,
Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to