acquire

Linus Torvalds Mon, 16 Jul 2018 12:30:26 -0700

On Mon, Jul 16, 2018 at 7:40 AM Michael Ellerman <m...@ellerman.id.au> wrote:
>
> If the numbers can be trusted it is actually slower to put the sync in
> lock, at least on one of the machines:
>
>               Time
> lwsync_sync   84,932,987,977
> sync_lwsync   93,185,930,333


Very funky.

> I guess arguably it's not a very macro benchmark, but we have a
> context_switch benchmark in the tree[1] which we often use to tune
> things, and it degrades badly. It just spins up two threads and has them
> ping-pong using yield.

I hacked that up to run on x86, and it only is about 5% locking
overhead in my profiles. It's about 18% __switch_to, and a lot of
system call entry/exit, but not a lot of locking.

I'm actually surprised it is even that much locking, since it seems to
be single-cpu, so there should be no contention and the lock (which
seems to be

        rq = this_rq();
        rq_lock(rq, &rf);

in do_sched_yield()) should stay local to the cpu.

And for you the locking is apparently even _more_ noticeable.

But yes, a 10% regression on that context switch thing is huge. You
shouldn't do ping-pong stuff, but people kind of do.

              Linus

Re: [PATCH v2] tools/memory-model: Add extra ordering for locks and remove it for ordinary release/acquire

Reply via email to