Hi Krishna, On Thu, Oct 11, 2018 at 12:04:31PM +0530, Krishna Kumar (Engineering) wrote: > I must say the improvements are pretty impressive! > > Earlier number reported with 24 processes: 519K > Earlier number reported with 24 threads: 79K > New RPS with system irq tuning, today's git, > configuration changes, 24 threads: 353K > Old code with same tuning gave: 290K
OK that's much better but I'm still horrified by the time taken in the load balancing algorithm. I thought it could be fwlc_reposition(), which contains an eb32_delete()+eb32_insert(), so I decided to replace this with a new eb32_move() which moves the node within the tree, and it didn't change anything here. Also I figured that I cannot manage to reach that high time spent in this lock (300ms here, 58s for you). There is one possible difference that might explain it, do you have a maxconn setting on your servers ? If so, is it possible that it's reached ? You can take a look at your stats page and see if the "Queue/Max" entry for any backend is non-null. Indeed, I'm seeing that once a server is saturated, we skip it for the next one. This part can be expensive. Ideally we should remove such servers from the tree until they're unblocked, but there is one special case making this difficult, which is the dynamic limitation (minconn+maxconn+fullconn). However I think we could improve this so that only this use case would be affected and not the other ones. I'm also seeing that this lock could be replaced by an RW lock. But before taking a deeper look, I'm interested in verifying that it's indeed the situation you're facing. Thanks, Willy

