bryancall opened a new pull request, #12839:
URL: https://github.com/apache/trafficserver/pull/12839

   ## Summary
   
   Adds a new metric `proxy.process.cache.stripe.lock_contention` that counts 
the number of times a thread fails to acquire the cache stripe mutex and must 
reschedule. This helps identify cache lock contention when tuning ATS thread 
counts vs cache volume counts.
   
   Also available per-volume as 
`proxy.process.cache.volume_N.stripe.lock_contention`.
   
   ## Background
   
   When ATS is configured with more threads than cache volumes, threads contend 
heavily for the stripe mutex, causing throughput degradation. This metric makes 
contention visible so operators can tune their configuration.
   
   ## Benchmark Results
   
   Testing on a 16-core system with 100 cached URLs:
   
   | Threads | Volumes | Throughput | Contentions/s |
   |---------|---------|------------|---------------|
   | 16 | 1 | 476k req/s | 12,095k |
   | 16 | 16 | 1,160k req/s | 177k |
   | 24 | 32 | 1,260k req/s | 161k |
   
   With only 1 volume, 16 threads is **slower** than 4 threads due to 
contention. Adding volumes eliminates the bottleneck.
   
   ## Usage
   
   ```bash
   # Global contention counter
   traffic_ctl metric get proxy.process.cache.stripe.lock_contention
   
   # Per-volume counters
   traffic_ctl metric match volume.*stripe.lock_contention
   ```
   
   ## Changes
   
   - `P_CacheStats.h`: Add `stripe_lock_contention` counter to `CacheStatsBlock`
   - `CacheProcessor.cc`: Register the metric
   - `P_CacheInternal.h`: Increment counter in `VC_LOCK_RETRY_EVENT()` and 
`VC_SCHED_LOCK_RETRY()`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to