masaori335 opened a new issue, #12788: URL: https://github.com/apache/trafficserver/issues/12788
# Summary We have observed a lock contention issue with the Stripe mutex. This is an umbrella issue to track related changes. # Problem `Cache::open_read()` has severe lock contention - every read operation (includes cache lookup) acquires an exclusive lock on `stripe->mutex`, serializing all cache operations and limiting throughput. https://github.com/apache/trafficserver/blob/7e366fa067f4916dbcec802eae049c0aea0acef6/src/iocore/cache/Cache.cc#L344 ## Difficulties I attempted to use a reader-writer lock instead of a mutex lock, and some proof-of-concept tests showed significant performance improvements. However, I found that we cannot simply replace this mutex lock with a reader-writer lock or a lock-free data structure. The main reason is that `StripeSM`, as a Continuation, requires the mutex lock when called from the event system. https://github.com/apache/trafficserver/pull/12601 is recent another attempt by @bryancall. ### Event Handlers - Event handlers of StripeSM https://github.com/apache/trafficserver/blob/7e366fa067f4916dbcec802eae049c0aea0acef6/src/iocore/cache/StripeSM.h#L118-L122 ### Dir operations Some `Dir` functions seems read only operation, but it actually does write operation under some conditions. - e.g. `Directory::probe()` https://github.com/apache/trafficserver/blob/7e366fa067f4916dbcec802eae049c0aea0acef6/src/iocore/cache/CacheDir.cc#L528-L534 # Proposed Solution Implement a two-tier locking architecture by decoupling `StripeSM` and `Stripe`: 1. Separate `StripeSM` (`Continuation`) and `Stripe` (shared data) `StripeSM` (a Continuation) contains event handlers, while `Stripe` contains shared data. Half of this change has already been completed by #11565 and related PRs, but we still need to clarify the separation between event handling and shared data access more explicitly. 2. Add Reader-Writer Lock to Stripe Access to the shared data requires a reader-writer lock to allow concurrent reading. Alternatively, making Stripe a lock-free data structure (using RCU or Hazard Pointers) is another option. 3. Allocate StripeSM per Transaction Each cache operation gets a lightweight StripeSM instance with its own mutex for event handling. It acquires an RW lock on the shared Stripe for data access. ## Architecture Diagram <img width="3600" height="885" alt="Image" src="https://github.com/user-attachments/assets/a1fafefc-fe8c-48b7-a4bf-283f432e1bec" /> -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
