On Mon, Mar 09, 2026 at 01:10:46PM +0000, Matthew Wilcox wrote:
> On Fri, Feb 27, 2026 at 04:00:22PM +0000, Dmitry Ilvokhin wrote:
> > Zone lock contention can significantly impact allocation and
> > reclaim latency, as it is a central synchronization point in
> > the page allocator and reclaim paths. Improved visibility into
> > its behavior is therefore important for diagnosing performance
> > issues in memory-intensive workloads.
> > 
> > On some production workloads at Meta, we have observed noticeable
> > zone lock contention. Deeper analysis of lock holders and waiters
> > is currently difficult with existing instrumentation.
> > 
> > While generic lock contention_begin/contention_end tracepoints
> > cover the slow path, they do not provide sufficient visibility
> > into lock hold times. In particular, the lack of a release-side
> > event makes it difficult to identify long lock holders and
> > correlate them with waiters. As a result, distinguishing between
> > short bursts of contention and pathological long hold times
> > requires additional instrumentation.
> > 
> > This patch series adds dedicated tracepoint instrumentation to
> > zone lock, following the existing mmap_lock tracing model.
> 
> I don't like this at all.  We have CONFIG_LOCK_STAT.  That should be
> improved insted of coming up with one-offs for every single lock
> that someone deems "special".

Thanks for the feedback, Matthew.

CONFIG_LOCK_STAT provides useful statistics, but it is primarily a
debug facility and is generally too heavyweight for the production
environments.

The motivation for this series was to provide lightweight observability
for the zone lock in production workloads.

I agree that improving generic lock instrumentation would be preferable.
I did consider whether something similar could be done generically for
spinlocks, but the unlock path there is typically just a single atomic
store, so adding generic lightweight instrumentation without affecting
the fast path is difficult.

In parallel, I've been experimenting with improving observability for
sleepable locks by adding a contended_release tracepoint, which would
allow correlating lock holders and waiters in a more generic way. I've
posted an RFC here:

https://lore.kernel.org/all/[email protected]/

I'd appreciate feedback on whether that direction makes sense for
improving the generic lock tracing infrastructure.

Reply via email to