On Wed, Nov 27, 2013 at 05:09:47PM +0100, Willy Tarreau wrote: > On Wed, Nov 27, 2013 at 10:41:12PM +0800, Igor wrote: > > It's almost the same except some servers with weight=0 in conf, script > > to disable/enable servers works fine with haproxy-ss-20131031, but > > with haproxy-ss-20131122 or newer, it will crash haproxy after > > several disable/enable commands via socket. I can mail you the conf if > > need. > > OK thank you for your config, Igor. It definitely helped. The issue was > related to the "track" servers with the recent changes. It caused an > (almost) infinite recursion. It was easier to see here because I have > no server responding to your addresses, so the process crashes as soon > as the first server fails the first check. > > I'm attaching the fix to apply on top of your snapshot, that I'm going > to merge into mainline. > > Simon, I suggest you apply the attached patch on top of your development > branch in case you still work on the lb agent checks, because then you're > affected as well.
Thanks and sorry for the bug. > > Thanks, > Willy > > >From 151cd51485f8ebe47aa3b2f2eedb604d4c2cfd0d Mon Sep 17 00:00:00 2001 > From: Willy Tarreau <w...@1wt.eu> > Date: Wed, 27 Nov 2013 16:52:23 +0100 > Subject: BUG/MAJOR: fix haproxy crash when using server tracking instead of > checks > > Igor at owind reported a very recent bug (just present in latest snapshot). > Commit "4a741432 MEDIUM: Paramatise functions over the check of a server" > causes up/down to die with tracked servers due to a typo. > > The following call in set_server_down causes the server to put itself > down recurseively because "check" is the current server's check, so once > fed to the function again, it will pass through the exact same path (note > we have the exact symmetry in set_server_up) : > > for (srv = s->tracknext; srv; srv = srv->tracknext) > if (!(srv->state & SRV_MAINTAIN)) > /* Only notify tracking servers that are not already in > maintenance. */ > set_server_down(check); > > Instead we should stop the tracking server being visited in the loop : > > for (srv = s->tracknext; srv; srv = srv->tracknext) > if (!(srv->state & SRV_MAINTAIN)) > /* Only notify tracking servers that are not already in > maintenance. */ > set_server_down(&srv->check); > > But that's not exactly enough because srv->check->server is only set when > checks are enabled, so ->server is NULL for tracking servers, still causing a > crash upon first iteration. The fix is easy and consists in always > initializing > check->server when creating a new server, which is what was already done a few > patches later by 69d29f9 (MEDIUM: cfgparse: Factor out check initialisation). > > With the fix above alone on top of current version or snapshot 20131122, the > problem disappears. > > Thanks to Igor for testing and reporting the issue. > --- > src/checks.c | 12 ++++++------ > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/src/checks.c b/src/checks.c > index cfbb8a3..a722c12 100644 > --- a/src/checks.c > +++ b/src/checks.c > @@ -445,10 +445,10 @@ void set_server_down(struct check *check) > s->counters.down_trans++; > > if (s->state & SRV_CHECKED) > - for(srv = s->tracknext; srv; srv = srv->tracknext) > - if (! (srv->state & SRV_MAINTAIN)) > + for (srv = s->tracknext; srv; srv = srv->tracknext) > + if (!(srv->state & SRV_MAINTAIN)) > /* Only notify tracking servers that > are not already in maintenance. */ > - set_server_down(check); > + set_server_down(&srv->check); > } > > check->health = 0; /* failure */ > @@ -520,10 +520,10 @@ void set_server_up(struct check *check) { > send_log(s->proxy, LOG_NOTICE, "%s.\n", trash.str); > > if (s->state & SRV_CHECKED) > - for(srv = s->tracknext; srv; srv = srv->tracknext) > - if (! (srv->state & SRV_MAINTAIN)) > + for (srv = s->tracknext; srv; srv = srv->tracknext) > + if (!(srv->state & SRV_MAINTAIN)) > /* Only notify tracking servers if > they're not in maintenance. */ > - set_server_up(check); > + set_server_up(&srv->check); > } > > if (check->health >= s->rise) > -- > 1.7.12.1 >