* Alan Stern ([email protected]) wrote:
> On Tue, 26 Oct 2010, Mathieu Desnoyers wrote:
>
> > * Peter Zijlstra ([email protected]) wrote:
> > > On Tue, 2010-10-26 at 11:56 -0500, Pierre Tardy wrote:
> > > >
> > > > + trace_runtime_pm_usage(dev,
> > > > atomic_read(&dev->power.usage_count)+1);
> > > > atomic_inc(&dev->power.usage_count);
> > >
> > > That's terribly racy..
> >
> > Looking at the original code, it looks racy even without considering the
> > tracepoint:
> >
> > int __pm_runtime_get(struct device *dev, bool sync)
> > {
> > int retval;
> >
> > + trace_runtime_pm_usage(dev, atomic_read(&dev->power.usage_count)+1);
> > atomic_inc(&dev->power.usage_count);
> > retval = sync ? pm_runtime_resume(dev) : pm_request_resume(dev);
> >
> > There is no implied memory barrier after "atomic_inc". So either all these
> > inc/dec are protected with mutexes or spinlocks, in which case one might
> > wonder
> > why atomic operations are used at all, or it's a racy mess. (I vote for the
> > second option)
>
> I don't understand. What's the problem? The inc/dec are atomic
> because they are not protected by spinlocks, but everything else is
> (aside from the tracepoint, which is new).
>
> > kref should certainly be used there.
>
> What for?
kref has the following "get":
atomic_inc(&kref->refcount);
smp_mb__after_atomic_inc();
What seems to be missing in __pm_runtime_get() and pm_runtime_get_noresume() is
the memory barrier after the atomic increment. The atomic increment is free to
be reordered into the following spinlock (within pm_request_resume or pm_request
resume execution) because taking a spinlock only acts as a memory barrier with
acquire semantic, not a full memory barrier.
So AFAIU, the failure scenario would be as follows (sorry for the 80+ columns):
initial conditions: usage_count = 1
CPU A CPU B
1) __pm_runtime_get() (sync = true)
2) atomic_inc(&usage_count) (not committed to memory yet)
3) pm_runtime_resume()
4) spin_lock_irqsave(&dev->power.lock, flags);
5) retval = __pm_request_resume(dev);
6) (execute the body of __pm_request_resume and return)
7) __pm_runtime_put()
(sync = true)
8) if
(atomic_dec_and_test(&dev->power.usage_count))
(still see
usage_count == 1 before decrement,
thus decrement
to 0)
9) pm_runtime_idle()
10) spin_unlock_irqrestore(&dev->power.lock, flags)
11)
spin_lock_irq(&dev->power.lock);
12) retval =
__pm_runtime_idle(dev);
13)
spin_unlock_irq(&dev->power.lock);
So we end up in a situation where CPU A expects the device to be resumed, but
the last action performed has been to bring it to idle.
A smp_mb__after_atomic_inc() between lines 2 and 3 would fix this.
Thanks,
Mathieu
--
Mathieu Desnoyers
Operating System Efficiency R&D Consultant
EfficiOS Inc.
http://www.efficios.com
--
To unsubscribe from this list: send the line "unsubscribe linux-omap" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html