lock acquisitions

Jesper Dangaard Brouer Wed, 13 May 2026 09:17:42 -0700



On 08/05/2026 20.07, Dmitry Ilvokhin wrote:

On Fri, May 08, 2026 at 07:40:51PM +0200, Vlastimil Babka (SUSE) wrote:

On 5/8/26 7:38 PM, Vlastimil Babka (SUSE) wrote:

On 5/8/26 7:29 PM, Andrew Morton wrote:

e .configOn Fri,  8 May 2026 18:22:06 +0200 [email protected] wrote:

Add tracepoints to the page allocator fast paths that acquire
zone->lock, allowing diagnosis of lock contention in production.


Thanks, I'm surprised we haven't done this yet.


There was a recent attempt [1]. Not being a generic solution wasn't welcome.

[1] https://lore.kernel.org/all/[email protected]/


And this is the generic solution I think?

https://lore.kernel.org/all/[email protected]/


Thanks for cc'ing me, Vlastimil.

Yes, this is an attempt at a generic solution for tracing contended
locks, including spinlocks, so it should also cover the use case
proposed in this patchset.


I'm aware of the generic solution and often use `perf lock contention`.
And the tool libbpf-tools/klockstat. My experience is unfortunately that
enabling these tracepoint is prohibitive expensive on production server,
and production suffers when I run these tools.

I'm very happy to see a patchset adding a contended case. But I worry
that tracing all contented locks in the system is also too much to have
enabled continuously for production.

This patch is carefully constructed to minimize overhead, such that I
can enable this continuously on production to catch issues.  If I
identify issue I will use the generic tracpoints for further debugging.

In fact, zone->lock contention was one of the primary motivations for
this work.


In the generic solution I'm loosing the "zone" and pages "count".  I
need this information to get the answers I'm looking for.  Specifically
I'm looking at reducing CONFIG_PCP_BATCH_SCALE_MAX, but I want to this
to be a data-driven decision (my first principle is: if you cannot
measure it you cannot improve it).

I'm likely going to apply this patch to our production system, such that
I can get my data-driven decision.  I need to deploy it widely enough to
get enough server experiencing direct-reclaim.  I'll report back if
people are interested in these learning?

--Jesper

Re: [PATCH 1/2] mm/page_alloc: add tracepoints for zone->lock acquisitions

Reply via email to