Re: sched: softlockups in multi_cpu_stop

Jason Low Fri, 06 Mar 2015 10:58:59 -0800

On Fri, 2015-03-06 at 09:19 -0800, Davidlohr Bueso wrote:
> On Fri, 2015-03-06 at 13:32 +0100, Ingo Molnar wrote:
> > * Sasha Levin <sasha.le...@oracle.com> wrote:
> > 
> > > I've bisected this to "locking/rwsem: Check for active lock before 
> > > bailing on spinning". Relevant parties Cc'ed.
> > 
> > That would be:
> > 
> >   1a99367023f6 ("locking/rwsem: Check for active lock before bailing on 
> > spinning")
> 
> > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c
> > index 1c0d11e8ce34..e4ad019e23f5 100644
> > --- a/kernel/locking/rwsem-xadd.c
> > +++ b/kernel/locking/rwsem-xadd.c
> > @@ -298,23 +298,30 @@ static inline bool 
> > rwsem_try_write_lock_unqueued(struct rw_semaphore *sem)
> >  static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem)
> >  {
> >     struct task_struct *owner;
> > -   bool on_cpu = false;
> > +   bool ret = true;
> >  
> >     if (need_resched())
> >             return false;
> >  
> >     rcu_read_lock();
> >     owner = ACCESS_ONCE(sem->owner);
> > -   if (owner)
> > -           on_cpu = owner->on_cpu;
> > -   rcu_read_unlock();
> > +   if (!owner) {
> > +           long count = ACCESS_ONCE(sem->count);
> > +           /*
> > +            * If sem->owner is not set, yet we have just recently entered 
> > the
> > +            * slowpath with the lock being active, then there is a 
> > possibility
> > +            * reader(s) may have the lock. To be safe, bail spinning in 
> > these
> > +            * situations.
> > +            */
> > +           if (count & RWSEM_ACTIVE_MASK)
> > +                   ret = false;
> > +           goto done;
> 
> Hmmm so the lockup would be due to this (when owner is non-nil the patch
> has no effect), telling users to spin instead of sleep -- _except_ for
> this condition. And when spinning we're always checking for need_resched
> to be safe. So even if this function was completely bogus, we'd end up
> needlessly spinning but I'm surprised about the lockup. Maybe coffee
> will make things clearer.


Right, the can_spin_on_owner() was originally added to the mutex
spinning code for optimization purposes, particularly so that we can
avoid adding the spinner to the OSQ only to find that it doesn't need to
spin. This function needing to return a correct value should really only
affect performance, so yes, lockups due to this seems surprising.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: sched: softlockups in multi_cpu_stop

Reply via email to