Are you sure about this ? I have a core dump locked on the same place (state machine for powering cpu down for the task swap) from a 3.13 (+ upstream patches) and this commit wasn't backported yet.
-> multi_cpu_stop -> do { } while (curstate != MULTI_STOP_EXIT); In my case, curstate is WAY different from enum containing MULTI_STOP_EXIT (4). Register totally messed up (probably after cpu_relax(), right where you were trapped -> after the pause instruction). my case: PID: 118 TASK: ffff883fd28ec7d0 CPU: 9 COMMAND: "migration/9" ... [exception RIP: multi_cpu_stop+0x64] RIP: ffffffff810f5944 RSP: ffff883fd2907d98 RFLAGS: 00000246 RAX: 0000000000000010 RBX: 0000000000000010 RCX: 0000000000000246 RDX: ffff883fd2907d98 RSI: 0000000000000000 RDI: 0000000000000001 RBP: ffffffff810f5944 R8: ffffffff810f5944 R9: 0000000000000000 R10: ffff883fd2907d98 R11: 0000000000000246 R12: ffffffffffffffff R13: ffff883f55d01b48 R14: 0000000000000000 R15: 0000000000000001 ORIG_RAX: 0000000000000001 CS: 0010 SS: 0000 --- <NMI exception stack> --- #4 [ffff883fd2907d98] multi_cpu_stop+0x64 at ffffffff810f5944 208 } while (curstate != MULTI_STOP_EXIT); ---> RIP RIP 0xffffffff810f5944 <+100>: cmp $0x4,%edx ---> CHECKING FOR MULTI_STOP_EXIT RDX: ffff883fd2907d98 -> does not make any sense ### If i'm reading this right, """ CPU 05 - PID 14990 do_numa_page task_numa_fault numa_migrate_preferred task_numa_migrate migrate_swap (curr: 14990, task: 14996) stop_two_cpus (cpu1=05(14996), cpu2=00(14990)) wait_for_completion 14990 - CPU05 14996 - CPU00 stop_two_cpus: multi_stop_data (msdata->state = MULTI_STOP_PREPARE) smp_call_function_single (min=cpu2=00, irq_cpu_stop_queue_work, wait=1) smp_call_function_single (ran on lowest CPU, 00 for this case) irq_cpu_stop_queue_work cpu_stop_queue_work(cpu1=05(14996)) # add work (multi_cpu_stop) to cpu 05 cpu_stopper queue cpu_stop_queue_work(cpu2=00(14990)) # add work (multi_cpu_stop) to cpu 00 cpu_stopper queue wait_for_completion() --> HERE """ in my case, checking task structs for tasks scheduled when "waiting_for_completion()": PID 14990 CPU 05 -> PID 14996 CPU 00 PID 14991 CPU 30 -> PID 14998 CPU 01 PID 14992 CPU 30 -> PID 14998 CPU 01 PID 14996 CPU 00 -> PID 14992 CPU 30 PID 14998 CPU 01 -> PID 14990 CPU 05 AND > 102 2 6 ffff881fd2ea97f0 RU 0.0 0 0 [migration/6] > 118 2 9 ffff883fd28ec7d0 RU 0.0 0 0 [migration/9] > 143 2 14 ffff883fd29d47d0 RU 0.0 0 0 [migration/14] > 148 2 15 ffff883fd29fc7d0 RU 0.0 0 0 [migration/15] > 153 2 16 ffff881fd2f517f0 RU 0.0 0 0 [migration/16] THEN I am still waiting for 5 cpu_stopper_thread -> multi_cpu_stop just scheduled (probably in the per cpu's queue of cpus 0,1,5,30), not running yet. AND I don't have any "wait_for_completion" for those "OLDER" migration threads (6, 9, 14, 15 and 16) Probably wait_for_completion signaled done.completion before racing. Looks like something messed up with curstate in the "multi_cpu_stop" state machine. /* Simple state machine */ do { /* Chill out and ensure we re-read multi_stop_state. */ cpu_relax(); cpu_relax maybe ? -- Rafael Tinoco On Fri, Mar 6, 2015 at 9:32 AM, Ingo Molnar <mi...@kernel.org> wrote: > > * Sasha Levin <sasha.le...@oracle.com> wrote: > >> I've bisected this to "locking/rwsem: Check for active lock before bailing >> on spinning". Relevant parties Cc'ed. > > That would be: > > 1a99367023f6 ("locking/rwsem: Check for active lock before bailing on > spinning") > > attached below. > > Thanks, > > Ingo > > ===========================> > From 1a99367023f6ac664365a37fa508b059e31d0e88 Mon Sep 17 00:00:00 2001 > From: Davidlohr Bueso <d...@stgolabs.net> > Date: Fri, 30 Jan 2015 01:14:27 -0800 > Subject: [PATCH] locking/rwsem: Check for active lock before bailing on > spinning > > 37e9562453b ("locking/rwsem: Allow conservative optimistic > spinning when readers have lock") forced the default for > optimistic spinning to be disabled if the lock owner was > nil, which makes much sense for readers. However, while > it is not our priority, we can make some optimizations > for write-mostly workloads. We can bail the spinning step > and still be conservative if there are any active tasks, > otherwise there's really no reason not to spin, as the > semaphore is most likely unlocked. > > This patch recovers most of a Unixbench 'execl' benchmark > throughput by sleeping less and making better average system > usage: > > before: > CPU %user %nice %system %iowait %steal %idle > all 0.60 0.00 8.02 0.00 0.00 91.38 > > after: > CPU %user %nice %system %iowait %steal %idle > all 1.22 0.00 70.18 0.00 0.00 28.60 > > Signed-off-by: Davidlohr Bueso <dbu...@suse.de> > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > Acked-by: Jason Low <jason.l...@hp.com> > Cc: Linus Torvalds <torva...@linux-foundation.org> > Cc: Michel Lespinasse <wal...@google.com> > Cc: Paul E. McKenney <paul...@linux.vnet.ibm.com> > Cc: Tim Chen <tim.c.c...@linux.intel.com> > Link: > http://lkml.kernel.org/r/1422609267-15102-6-git-send-email-d...@stgolabs.net > Signed-off-by: Ingo Molnar <mi...@kernel.org> > --- > kernel/locking/rwsem-xadd.c | 27 +++++++++++++++++---------- > 1 file changed, 17 insertions(+), 10 deletions(-) > > diff --git a/kernel/locking/rwsem-xadd.c b/kernel/locking/rwsem-xadd.c > index 1c0d11e8ce34..e4ad019e23f5 100644 > --- a/kernel/locking/rwsem-xadd.c > +++ b/kernel/locking/rwsem-xadd.c > @@ -298,23 +298,30 @@ static inline bool rwsem_try_write_lock_unqueued(struct > rw_semaphore *sem) > static inline bool rwsem_can_spin_on_owner(struct rw_semaphore *sem) > { > struct task_struct *owner; > - bool on_cpu = false; > + bool ret = true; > > if (need_resched()) > return false; > > rcu_read_lock(); > owner = ACCESS_ONCE(sem->owner); > - if (owner) > - on_cpu = owner->on_cpu; > - rcu_read_unlock(); > + if (!owner) { > + long count = ACCESS_ONCE(sem->count); > + /* > + * If sem->owner is not set, yet we have just recently > entered the > + * slowpath with the lock being active, then there is a > possibility > + * reader(s) may have the lock. To be safe, bail spinning in > these > + * situations. > + */ > + if (count & RWSEM_ACTIVE_MASK) > + ret = false; > + goto done; > + } > > - /* > - * If sem->owner is not set, yet we have just recently entered the > - * slowpath, then there is a possibility reader(s) may have the lock. > - * To be safe, avoid spinning in these situations. > - */ > - return on_cpu; > + ret = owner->on_cpu; > +done: > + rcu_read_unlock(); > + return ret; > } > > static inline bool owner_running(struct rw_semaphore *sem, > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/