c-taylor opened a new issue #7375:
URL: https://github.com/apache/trafficserver/issues/7375


   As parents of a topology are marked down under load, 
'HostStatus::getHostStatus' can cause excessive lock behaviour resulting in 
high system time, reduced output and stats holes.
   
   When performing failure testing: Overloading configured parents causes lock 
contention on the stats storage.
   It was possible to consume almost all ET_NET thread time with a few failing 
parents and fewer than 5,000 RPS.
   
   ### Fault replication
   Increase load through an edge -> parent configuration until the parents 
start to fail.
   I used connection limits as the failure trigger as it was predictable to 
fail.
   
   ### Observations
   As parents fail there is an increase in 'HostStatus::getHostStatus' 
contention, especially when the last parent fails.
   This causes a reduction in all 'good' work, errors to clients, content 
already in cache.
   
   1. perf traces and flame graphs show near 100% system consumption on lock 
activity.
   <img width="505" alt="getHostStaus_crop" 
src="https://user-images.githubusercontent.com/12032425/101389970-10b06400-38ba-11eb-8ccb-8bc829c63814.png";>
   
   2. traffic_server metrics stop updating
   3. Response and data rates drop
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to