bryancall opened a new pull request, #12839: URL: https://github.com/apache/trafficserver/pull/12839
## Summary Adds a new metric `proxy.process.cache.stripe.lock_contention` that counts the number of times a thread fails to acquire the cache stripe mutex and must reschedule. This helps identify cache lock contention when tuning ATS thread counts vs cache volume counts. Also available per-volume as `proxy.process.cache.volume_N.stripe.lock_contention`. ## Background When ATS is configured with more threads than cache volumes, threads contend heavily for the stripe mutex, causing throughput degradation. This metric makes contention visible so operators can tune their configuration. ## Benchmark Results Testing on a 16-core system with 100 cached URLs: | Threads | Volumes | Throughput | Contentions/s | |---------|---------|------------|---------------| | 16 | 1 | 476k req/s | 12,095k | | 16 | 16 | 1,160k req/s | 177k | | 24 | 32 | 1,260k req/s | 161k | With only 1 volume, 16 threads is **slower** than 4 threads due to contention. Adding volumes eliminates the bottleneck. ## Usage ```bash # Global contention counter traffic_ctl metric get proxy.process.cache.stripe.lock_contention # Per-volume counters traffic_ctl metric match volume.*stripe.lock_contention ``` ## Changes - `P_CacheStats.h`: Add `stripe_lock_contention` counter to `CacheStatsBlock` - `CacheProcessor.cc`: Register the metric - `P_CacheInternal.h`: Increment counter in `VC_LOCK_RETRY_EVENT()` and `VC_SCHED_LOCK_RETRY()` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
