Add generic and target specific support for local{,64}_try_cmpxchg
and wire up support for all targets that use local_t infrastructure.

The patch enables x86 targets to emit special instruction for
local_try_cmpxchg and also local64_try_cmpxchg for x86_64.

The last patch changes __perf_output_begin in events/ring_buffer
to use new locking primitive and improves code from

     4b3:       48 8b 82 e8 00 00 00    mov    0xe8(%rdx),%rax
     4ba:       48 8b b8 08 04 00 00    mov    0x408(%rax),%rdi
     4c1:       8b 42 1c                mov    0x1c(%rdx),%eax
     4c4:       48 8b 4a 28             mov    0x28(%rdx),%rcx
     4c8:       85 c0                   test   %eax,%eax
     ...
     4ef:       48 89 c8                mov    %rcx,%rax
     4f2:       48 0f b1 7a 28          cmpxchg %rdi,0x28(%rdx)
     4f7:       48 39 c1                cmp    %rax,%rcx
     4fa:       75 b7                   jne    4b3 <...>

to

     4b2:       48 8b 4a 28             mov    0x28(%rdx),%rcx
     4b6:       48 8b 82 e8 00 00 00    mov    0xe8(%rdx),%rax
     4bd:       48 8b b0 08 04 00 00    mov    0x408(%rax),%rsi
     4c4:       8b 42 1c                mov    0x1c(%rdx),%eax
     4c7:       85 c0                   test   %eax,%eax
     ...
     4d4:       48 89 c8                mov    %rcx,%rax
     4d7:       48 0f b1 72 28          cmpxchg %rsi,0x28(%rdx)
     4dc:       0f 85 d0 00 00 00       jne    5b2 <...>
     ...
     5b2:       48 89 c1                mov    %rax,%rcx
     5b5:       e9 fc fe ff ff          jmp    4b6 <...>

Please note that in addition to removed compare, the load from
0x28(%rdx) gets moved out of the loop and the code is rearranged
according to likely/unlikely tags in the source.
---
v2:

Implement target specific support for local_try_cmpxchg and
local_cmpxchg using typed C wrappers that call their _local
counterpart and provide additional checking of their input
arguments.

Cc: Richard Henderson <richard.hender...@linaro.org>
Cc: Ivan Kokshaysky <i...@jurassic.park.msu.ru>
Cc: Matt Turner <matts...@gmail.com>
Cc: Huacai Chen <chenhua...@kernel.org>
Cc: WANG Xuerui <ker...@xen0n.name>
Cc: Thomas Bogendoerfer <tsbog...@alpha.franken.de>
Cc: Michael Ellerman <m...@ellerman.id.au>
Cc: Nicholas Piggin <npig...@gmail.com>
Cc: Christophe Leroy <christophe.le...@csgroup.eu>
Cc: Thomas Gleixner <t...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Dave Hansen <dave.han...@linux.intel.com>
Cc: "H. Peter Anvin" <h...@zytor.com>
Cc: Arnd Bergmann <a...@arndb.de>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Arnaldo Carvalho de Melo <a...@kernel.org>
Cc: Mark Rutland <mark.rutl...@arm.com>
Cc: Alexander Shishkin <alexander.shish...@linux.intel.com>
Cc: Jiri Olsa <jo...@kernel.org>
Cc: Namhyung Kim <namhy...@kernel.org>
Cc: Ian Rogers <irog...@google.com>
Cc: Will Deacon <w...@kernel.org>
Cc: Boqun Feng <boqun.f...@gmail.com>
Cc: Jiaxun Yang <jiaxun.y...@flygoat.com>
Cc: Jun Yi <yi...@loongson.cn>

Uros Bizjak (5):
  locking/atomic: Add generic try_cmpxchg{,64}_local support
  locking/generic: Wire up local{,64}_try_cmpxchg
  locking/arch: Wire up local_try_cmpxchg
  locking/x86: Define arch_try_cmpxchg_local
  events: Illustrate the transition to local{,64}_try_cmpxchg

 arch/alpha/include/asm/local.h              | 12 +++++++++--
 arch/loongarch/include/asm/local.h          | 13 +++++++++--
 arch/mips/include/asm/local.h               | 13 +++++++++--
 arch/powerpc/include/asm/local.h            | 11 ++++++++++
 arch/x86/events/core.c                      |  9 ++++----
 arch/x86/include/asm/cmpxchg.h              |  6 ++++++
 arch/x86/include/asm/local.h                | 13 +++++++++--
 include/asm-generic/local.h                 |  1 +
 include/asm-generic/local64.h               | 12 ++++++++++-
 include/linux/atomic/atomic-arch-fallback.h | 24 ++++++++++++++++++++-
 include/linux/atomic/atomic-instrumented.h  | 20 ++++++++++++++++-
 kernel/events/ring_buffer.c                 |  5 +++--
 scripts/atomic/gen-atomic-fallback.sh       |  4 ++++
 scripts/atomic/gen-atomic-instrumented.sh   |  2 +-
 14 files changed, 126 insertions(+), 19 deletions(-)

-- 
2.39.2

Reply via email to