On Mon, Jul 16, 2018 at 7:40 AM Michael Ellerman <m...@ellerman.id.au> wrote: > > If the numbers can be trusted it is actually slower to put the sync in > lock, at least on one of the machines: > > Time > lwsync_sync 84,932,987,977 > sync_lwsync 93,185,930,333
Very funky. > I guess arguably it's not a very macro benchmark, but we have a > context_switch benchmark in the tree[1] which we often use to tune > things, and it degrades badly. It just spins up two threads and has them > ping-pong using yield. I hacked that up to run on x86, and it only is about 5% locking overhead in my profiles. It's about 18% __switch_to, and a lot of system call entry/exit, but not a lot of locking. I'm actually surprised it is even that much locking, since it seems to be single-cpu, so there should be no contention and the lock (which seems to be rq = this_rq(); rq_lock(rq, &rf); in do_sched_yield()) should stay local to the cpu. And for you the locking is apparently even _more_ noticeable. But yes, a 10% regression on that context switch thing is huge. You shouldn't do ping-pong stuff, but people kind of do. Linus