On 2025-07-25 13:44, Nysal Jan K.A. wrote:
Add a lock contention tracepoint in the queued spinlock slowpath.
Also add the __lockfunc annotation so that in_lock_functions()
works as expected.

Signed-off-by: Nysal Jan K.A. <ny...@linux.ibm.com>
---
 arch/powerpc/lib/qspinlock.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/lib/qspinlock.c b/arch/powerpc/lib/qspinlock.c
index bcc7e4dff8c3..622e7f45c2ce 100644
--- a/arch/powerpc/lib/qspinlock.c
+++ b/arch/powerpc/lib/qspinlock.c
@@ -9,6 +9,7 @@
 #include <linux/sched/clock.h>
 #include <asm/qspinlock.h>
 #include <asm/paravirt.h>
+#include <trace/events/lock.h>

 #define MAX_NODES      4

@@ -708,8 +709,9 @@ static __always_inline void
queued_spin_lock_mcs_queue(struct qspinlock *lock, b
        qnodesp->count--;
 }

-void queued_spin_lock_slowpath(struct qspinlock *lock)
+void __lockfunc queued_spin_lock_slowpath(struct qspinlock *lock)
 {
+       trace_contention_begin(lock, LCB_F_SPIN);
        /*
         * This looks funny, but it induces the compiler to inline both
         * sides of the branch rather than share code as when the condition
@@ -718,16 +720,17 @@ void queued_spin_lock_slowpath(struct qspinlock *lock)
        if (IS_ENABLED(CONFIG_PARAVIRT_SPINLOCKS) && is_shared_processor()) {
                if (try_to_steal_lock(lock, true)) {
                        spec_barrier();
-                       return;
+               } else {
+                       queued_spin_lock_mcs_queue(lock, true);
                }
-               queued_spin_lock_mcs_queue(lock, true);
        } else {
                if (try_to_steal_lock(lock, false)) {
                        spec_barrier();
-                       return;
+               } else {
+                       queued_spin_lock_mcs_queue(lock, false);
                }
-               queued_spin_lock_mcs_queue(lock, false);
        }
+       trace_contention_end(lock, 0);
 }
 EXPORT_SYMBOL(queued_spin_lock_slowpath);

Hello,

I have verified the patch with the latest upstream Linux kernel, and here are my findings:

———Kernel Version———
6.16.0-rc7-160000.11-default+
———perf --version———
perf version 6.16.rc7.g5f33ebd2018c

To test this patch, I used the Lockstorm benchmark, which rigorously exercises spinlocks from kernel space.

Benchmark repository: https://github.com/lop-devops/lockstorm
To capture all events related to the Lockstorm benchmark, I used the following command:
cmd: perf lock record -a insmod lockstorm.ko
After generating the perf.data, I analyzed the results using:
cmd: perf lock contention -a -i perf.data

————Logs————
contended   total wait     max wait     avg wait         type   caller

6187241 12.50 m 2.30 ms 121.22 us spinlock kthread+0x160 78 8.23 ms 209.87 us 105.47 us rwlock:W do_exit+0x378 71 7.97 ms 208.07 us 112.24 us spinlock do_exit+0x378 68 4.18 ms 210.04 us 61.43 us rwlock:W release_task+0xe0 63 3.96 ms 204.02 us 62.90 us spinlock release_task+0xe0 115 477.15 us 19.69 us 4.15 us spinlock rcu_report_qs_rdp+0x40 250 437.34 us 5.34 us 1.75 us spinlock raw_spin_rq_lock_nested+0x24 32 156.32 us 13.56 us 4.88 us spinlock cgroup_exit+0x34 19 88.12 us 12.20 us 4.64 us spinlock exit_fs+0x44 12 23.25 us 3.09 us 1.94 us spinlock lock_hrtimer_base+0x4c 1 18.83 us 18.83 us 18.83 us rwsem:R btrfs_tree_read_lock_nested+0x38 1 17.84 us 17.84 us 17.84 us rwsem:W btrfs_tree_lock_nested+0x38 10 15.75 us 5.72 us 1.58 us spinlock raw_spin_rq_lock_nested+0x24 5 15.08 us 5.59 us 3.02 us spinlock mix_interrupt_randomness+0xb4 2 12.78 us 9.50 us 4.26 us spinlock raw_spin_rq_lock_nested+0x24 1 11.13 us 11.13 us 11.13 us spinlock __queue_work+0x338 3 10.79 us 7.04 us 3.60 us spinlock raw_spin_rq_lock_nested+0x24 3 8.17 us 4.58 us 2.72 us spinlock raw_spin_rq_lock_nested+0x24 3 7.99 us 3.13 us 2.66 us spinlock lock_hrtimer_base+0x4c 2 6.66 us 4.57 us 3.33 us spinlock free_pcppages_bulk+0x50 3 5.34 us 2.19 us 1.78 us spinlock ibmvscsi_handle_crq+0x1e4 2 3.71 us 2.32 us 1.85 us spinlock __hrtimer_run_queues+0x1b8 2 2.98 us 2.19 us 1.49 us spinlock raw_spin_rq_lock_nested+0x24 1 2.85 us 2.85 us 2.85 us spinlock raw_spin_rq_lock_nested+0x24 2 2.15 us 1.09 us 1.07 us spinlock raw_spin_rq_lock_nested+0x24 2 2.06 us 1.06 us 1.03 us spinlock raw_spin_rq_lock_nested+0x24 1 1.69 us 1.69 us 1.69 us spinlock raw_spin_rq_lock_nested+0x24 1 1.53 us 1.53 us 1.53 us spinlock __queue_work+0xd8 1 1.27 us 1.27 us 1.27 us spinlock pull_rt_task+0xa0 1 1.16 us 1.16 us 1.16 us spinlock raw_spin_rq_lock_nested+0x24 1 740 ns 740 ns 740 ns spinlock add_device_randomness+0x5c 1 566 ns 566 ns 566 ns spinlock raw_spin_rq_lock_nested+0x24

From the results, we were able to observe lock contention specifically on spinlocks.

The patch works as expected.
Thank you for the patch!

Tested-by: Samir Mulani <sa...@linux.ibm.com>

Reply via email to