Maksim, On Wed, Apr 24, 2019 at 07:53:08AM +0200, Willy Tarreau wrote: > Hi Maksim, > > On Wed, Apr 24, 2019 at 08:39:23AM +0300, ?????? ????????? wrote: > > Hi! > > > > It seems to me there is something wrong with this patch: for some reason > > process stops responding with 100% CPU used by all threads. > > Ouch! This looks like an awful AB/BA deadlock. Indeed, > fwrr_get_server_from_group() wants the lbprm lock and will grab the > server's lock while fwrr_update_server_weight() wants the server's > lock to be held and will use the lbprm lock :-( > > I need to revisit all this then :-(
I completely audited the RR code again and could get rid of the server lock which was causing the issue there. I could instantly reproduce your bug by switching servers ON and OFF 50 times a second while injecting at 200000 requests/s, and now after the fix everything is stable. I've backported the fix to 1.9, you may want to pull to be safe again. Cheers, Willy