Implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI. Static calls patch an indirect branch into a direct branch at runtime. Out-of-line specifically has a caller directly call a trampoline, and the trampoline gets patched to directly call the target.
More context regarding the challenges with the ELF V2 ABI is in the RFC https://lore.kernel.org/linuxppc-dev/20220901055823.152983-1-bg...@linux.ibm.com/ This resolves the stack issue in the RFC by marking the trampoline as not preserving the TOC, so the linker will insert its own TOC saving trampoline + restore the TOC when the target returns. It is sub-optimal (a separate TOC saving trampoline is not necessary), but does not require any additional support beyond what's given in the ABI (unlike the other two suggestions in the RFC). Microbenchmarking shows a performance improvement in kernel-kernel-kernel calls on a Power9 when the indirect branch predictor is disabled. However the generic implementation performs better in every other case. And when branch prediction is enabled the generic implementation behaves like the control cases. | Case | Generic | Static | |------------|-----------------|-----------------| | control_kk | 221536 calls/ms | 221443 calls/ms | // control is direct call, no SC trampoline | control_mm | 221941 calls/ms | 221913 calls/ms | | kkk | 89657 calls/ms | 177835 calls/ms | // kernel caller -> kernel tramp -> kernel target | kkm | 89835 calls/ms | 53853 calls/ms | // kernel caller -> kernel tramp -> module target | kmk | 101808 calls/ms | 52280 calls/ms | // etc. | kmm | 101973 calls/ms | 52347 calls/ms | | mkk | 97621 calls/ms | 78044 calls/ms | | mkm | 97738 calls/ms | 38370 calls/ms | | mmk | 98839 calls/ms | 68436 calls/ms | | mmm | 98967 calls/ms | 68511 calls/ms | Using a noinline page-aligned target that adds 1 to a counter then runs 64 NOPs to iron out some processor timing quirks. The target is called in a loop like while (!READ_ONCE(stop)) static_call(bench_sc)(&counter); Again page aligned. The benchmark is stopped by a timer. The kernel trampoline's hardcoded TOC offset is done because importing the asm constants header imports an unrelated macro definition that is the same as the enum name it was generated from, which confuses the compiler when it reaches said enum definition. Benjamin Gray (6): powerpc/code-patching: Implement generic text patching function powerpc/module: Handle caller-saved TOC in module linker powerpc/module: Optimise nearby branches in ELF V2 ABI stub static_call: Move static call selftest to static_call_selftest.c powerpc/64: Add support for out-of-line static calls powerpc/64: Add tests for out-of-line static calls arch/powerpc/Kconfig | 22 +- arch/powerpc/include/asm/code-patching.h | 2 + arch/powerpc/include/asm/static_call.h | 80 ++++++- arch/powerpc/kernel/Makefile | 4 +- arch/powerpc/kernel/module_64.c | 25 ++- arch/powerpc/kernel/static_call.c | 203 ++++++++++++++++- arch/powerpc/kernel/static_call_test.c | 263 +++++++++++++++++++++++ arch/powerpc/lib/code-patching.c | 135 ++++++++---- kernel/Makefile | 1 + kernel/static_call_inline.c | 43 ---- kernel/static_call_selftest.c | 41 ++++ 11 files changed, 713 insertions(+), 106 deletions(-) create mode 100644 arch/powerpc/kernel/static_call_test.c create mode 100644 kernel/static_call_selftest.c -- 2.37.3