masaori335 opened a new issue, #12788:
URL: https://github.com/apache/trafficserver/issues/12788

   # Summary
   
   We have observed a lock contention issue with the Stripe mutex. This is an 
umbrella issue to track related changes.
   
   # Problem
   
   `Cache::open_read()` has severe lock contention - every read operation 
(includes cache lookup) acquires an exclusive lock on `stripe->mutex`, 
serializing all cache operations and limiting throughput.
   
   
https://github.com/apache/trafficserver/blob/7e366fa067f4916dbcec802eae049c0aea0acef6/src/iocore/cache/Cache.cc#L344
   
   ## Difficulties
   
   I attempted to use a reader-writer lock instead of a mutex lock, and some 
proof-of-concept tests showed significant performance improvements. However, I 
found that we cannot simply replace this mutex lock with a reader-writer lock 
or a lock-free data structure. The main reason is that `StripeSM`, as a 
Continuation, requires the mutex lock when called from the event system.
   
   https://github.com/apache/trafficserver/pull/12601 is recent another attempt 
by @bryancall.
   
   ### Event Handlers
   
   - Event handlers of StripeSM
   
   
https://github.com/apache/trafficserver/blob/7e366fa067f4916dbcec802eae049c0aea0acef6/src/iocore/cache/StripeSM.h#L118-L122
   
   ### Dir operations
   
   Some `Dir` functions seems read only operation, but it actually does write 
operation under some conditions. 
   
   - e.g.  `Directory::probe()`
   
   
https://github.com/apache/trafficserver/blob/7e366fa067f4916dbcec802eae049c0aea0acef6/src/iocore/cache/CacheDir.cc#L528-L534
   
   # Proposed Solution
   
   Implement a two-tier locking architecture by decoupling `StripeSM` and 
`Stripe`:
   
   1. Separate `StripeSM` (`Continuation`) and `Stripe` (shared data)
   
   `StripeSM` (a Continuation) contains event handlers, while `Stripe` contains 
shared data.
   
   Half of this change has already been completed by #11565 and related PRs, 
but we still need to clarify the separation between event handling and shared 
data access more explicitly.
   
   2. Add Reader-Writer Lock to Stripe
   
   Access to the shared data requires a reader-writer lock to allow concurrent 
reading. Alternatively, making Stripe a lock-free data structure (using RCU or 
Hazard Pointers) is another option.
   
   3. Allocate StripeSM per Transaction
   
   Each cache operation gets a lightweight StripeSM instance with its own mutex 
for event handling. It acquires an RW lock on the shared Stripe for data access.
   
   ## Architecture Diagram
   
   <img width="3600" height="885" alt="Image" 
src="https://github.com/user-attachments/assets/a1fafefc-fe8c-48b7-a4bf-283f432e1bec";
 />
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to