On Wed, May 13, 2026 at 05:32:41PM +0200, Jesper Dangaard Brouer wrote: > > > On 08/05/2026 20.07, Dmitry Ilvokhin wrote: > > On Fri, May 08, 2026 at 07:40:51PM +0200, Vlastimil Babka (SUSE) wrote: > > > On 5/8/26 7:38 PM, Vlastimil Babka (SUSE) wrote: > > > > On 5/8/26 7:29 PM, Andrew Morton wrote: > > > > > e .configOn Fri, 8 May 2026 18:22:06 +0200 [email protected] wrote: > > > > > > > > > > > Add tracepoints to the page allocator fast paths that acquire > > > > > > zone->lock, allowing diagnosis of lock contention in production. > > > > > > > > > > Thanks, I'm surprised we haven't done this yet. > > > > > > > > There was a recent attempt [1]. Not being a generic solution wasn't > > > > welcome. > > > > > > > > [1] https://lore.kernel.org/all/[email protected]/ > > > > > > And this is the generic solution I think? > > > > > > https://lore.kernel.org/all/[email protected]/ > > > > Thanks for cc'ing me, Vlastimil. > > > > Yes, this is an attempt at a generic solution for tracing contended > > locks, including spinlocks, so it should also cover the use case > > proposed in this patchset. > > > > I'm aware of the generic solution and often use `perf lock contention`. > And the tool libbpf-tools/klockstat. My experience is unfortunately that > enabling these tracepoint is prohibitive expensive on production server, > and production suffers when I run these tools.
I think it depends on the workload: in particular how lock heavy it is. At Meta we have a lock contention profiler (uses contention_begin and contention_end tracepoints under the hood) running continiously in the fleet. It is heavily sampled and each profilling session runs only for few seconds, but in practice it is usually enough to get a pretty good understanding what is going on. That said, I understand the concern, and I can absolutely imagine workloads where the overhead is still unacceptably high. > > I'm very happy to see a patchset adding a contended case. But I worry > that tracing all contented locks in the system is also too much to have > enabled continuously for production. > > This patch is carefully constructed to minimize overhead, such that I > can enable this continuously on production to catch issues. If I > identify issue I will use the generic tracpoints for further debugging. > > > > In fact, zone->lock contention was one of the primary motivations for > > this work. > > In the generic solution I'm loosing the "zone" and pages "count". I > need this information to get the answers I'm looking for. Specifically > I'm looking at reducing CONFIG_PCP_BATCH_SCALE_MAX, but I want to this > to be a data-driven decision (my first principle is: if you cannot > measure it you cannot improve it). > > I'm likely going to apply this patch to our production system, such that > I can get my data-driven decision. I need to deploy it widely enough to > get enough server experiencing direct-reclaim. I'll report back if > people are interested in these learning? I would definitely be interested in hearing about your findings. > > --Jesper
