Re: 1.9.6: SIGFPE in fwrr_update_position

Willy Tarreau Wed, 24 Apr 2019 05:49:46 -0700

Maksim,

On Wed, Apr 24, 2019 at 07:53:08AM +0200, Willy Tarreau wrote:
> Hi Maksim,
> 
> On Wed, Apr 24, 2019 at 08:39:23AM +0300, ?????? ????????? wrote:
> > Hi!
> > 
> > It seems to me there is something wrong with this patch: for some reason
> > process stops responding with 100% CPU used by all threads.
> 
> Ouch! This looks like an awful AB/BA deadlock. Indeed,
> fwrr_get_server_from_group() wants the lbprm lock and will grab the
> server's lock while fwrr_update_server_weight() wants the server's
> lock to be held and will use the lbprm lock :-(
> 
> I need to revisit all this then :-(


I completely audited the RR code again and could get rid of the server
lock which was causing the issue there. I could instantly reproduce
your bug by switching servers ON and OFF 50 times a second while
injecting at 200000 requests/s, and now after the fix everything is
stable. I've backported the fix to 1.9, you may want to pull to be
safe again.

Cheers,
Willy

Re: 1.9.6: SIGFPE in fwrr_update_position

Reply via email to