This series mitigates a severe contention issue in the membarrier system call by replacing the global membarrier_ipi_mutex with per-CPU mutexes for targeted expedited commands.
Problem: Currently, MEMBARRIER_CMD_PRIVATE_EXPEDITED_RSEQ relies on a single global mutex to serialize IPIs. On large systems with heavy concurrent usage, this creates significant contention. The issue cascades into hard lockups when combined with CFS bandwidth throttling, during which target CPUs may have interrupts disabled for extended periods. If membarrier is waiting on such a CPU, it holds the global mutex, stalling all other membarrier callers system-wide. Solution: Patch 1 introduces membarrier_cpu_mutexes to serialize expedited commands specifically when a target CPU is provided, isolating the lock contention to the targeted CPU. Broadcast commands continue to use the global mutex. Patch 2 introduces a dedicated kselftest reproducer (membarrier_rseq_stress) to permanently test this interaction. It aggressively hammers targeted membarrier commands within a deep, aggressively throttled cgroup hierarchy to prove the lockup scenario and validate the fix. Results: As measured by the stress test introduced in Patch 2, testing on an AMD Turin machine with 384 CPUs (2 NUMA nodes, SMT=2) shows: Throughput: ~200x increase in successful membarrier calls. --- Changes in v2: v1: https://lore.kernel.org/lkml/[email protected] - Use different mutex macros for global vs targeted cpu membarrier (Mathieu). - Use (unsigned int) cpu_id >= nr_cpu_id (Peter). - Removed #warning on unsupported architectures that causes build failures with W=1 reported by the kernel test robot. Aniket Gattani (2): sched/membarrier: Use per-CPU mutexes for targeted commands selftests/membarrier: Add rseq stress test for CFS throttle interactions kernel/sched/membarrier.c | 36 +- tools/testing/selftests/membarrier/Makefile | 5 +- .../membarrier/membarrier_rseq_stress.c | 951 ++++++++++++++++++ 3 files changed, 979 insertions(+), 13 deletions(-) create mode 100644 tools/testing/selftests/membarrier/membarrier_rseq_stress.c -- 2.54.0.rc1.513.gad8abe7a5a-goog

