Maksim,

On Wed, Apr 24, 2019 at 07:53:08AM +0200, Willy Tarreau wrote:
> Hi Maksim,
> 
> On Wed, Apr 24, 2019 at 08:39:23AM +0300, ?????? ????????? wrote:
> > Hi!
> > 
> > It seems to me there is something wrong with this patch: for some reason
> > process stops responding with 100% CPU used by all threads.
> 
> Ouch! This looks like an awful AB/BA deadlock. Indeed,
> fwrr_get_server_from_group() wants the lbprm lock and will grab the
> server's lock while fwrr_update_server_weight() wants the server's
> lock to be held and will use the lbprm lock :-(
> 
> I need to revisit all this then :-(

I completely audited the RR code again and could get rid of the server
lock which was causing the issue there. I could instantly reproduce
your bug by switching servers ON and OFF 50 times a second while
injecting at 200000 requests/s, and now after the fix everything is
stable. I've backported the fix to 1.9, you may want to pull to be
safe again.

Cheers,
Willy

Reply via email to