On Mon, Jul 16, 2018 at 7:40 AM Michael Ellerman <m...@ellerman.id.au> wrote:
>
> If the numbers can be trusted it is actually slower to put the sync in
> lock, at least on one of the machines:
>
>               Time
> lwsync_sync   84,932,987,977
> sync_lwsync   93,185,930,333

Very funky.

> I guess arguably it's not a very macro benchmark, but we have a
> context_switch benchmark in the tree[1] which we often use to tune
> things, and it degrades badly. It just spins up two threads and has them
> ping-pong using yield.

I hacked that up to run on x86, and it only is about 5% locking
overhead in my profiles. It's about 18% __switch_to, and a lot of
system call entry/exit, but not a lot of locking.

I'm actually surprised it is even that much locking, since it seems to
be single-cpu, so there should be no contention and the lock (which
seems to be

        rq = this_rq();
        rq_lock(rq, &rf);

in do_sched_yield()) should stay local to the cpu.

And for you the locking is apparently even _more_ noticeable.

But yes, a 10% regression on that context switch thing is huge. You
shouldn't do ping-pong stuff, but people kind of do.

              Linus

Reply via email to