On Wed, Nov 27, 2013 at 10:41:12PM +0800, Igor wrote: > It's almost the same except some servers with weight=0 in conf, script > to disable/enable servers works fine with haproxy-ss-20131031, but > with haproxy-ss-20131122 or newer, it will crash haproxy after > several disable/enable commands via socket. I can mail you the conf if > need.
OK thank you for your config, Igor. It definitely helped. The issue was related to the "track" servers with the recent changes. It caused an (almost) infinite recursion. It was easier to see here because I have no server responding to your addresses, so the process crashes as soon as the first server fails the first check. I'm attaching the fix to apply on top of your snapshot, that I'm going to merge into mainline. Simon, I suggest you apply the attached patch on top of your development branch in case you still work on the lb agent checks, because then you're affected as well. Thanks, Willy
>From 151cd51485f8ebe47aa3b2f2eedb604d4c2cfd0d Mon Sep 17 00:00:00 2001 From: Willy Tarreau <[email protected]> Date: Wed, 27 Nov 2013 16:52:23 +0100 Subject: BUG/MAJOR: fix haproxy crash when using server tracking instead of checks Igor at owind reported a very recent bug (just present in latest snapshot). Commit "4a741432 MEDIUM: Paramatise functions over the check of a server" causes up/down to die with tracked servers due to a typo. The following call in set_server_down causes the server to put itself down recurseively because "check" is the current server's check, so once fed to the function again, it will pass through the exact same path (note we have the exact symmetry in set_server_up) : for (srv = s->tracknext; srv; srv = srv->tracknext) if (!(srv->state & SRV_MAINTAIN)) /* Only notify tracking servers that are not already in maintenance. */ set_server_down(check); Instead we should stop the tracking server being visited in the loop : for (srv = s->tracknext; srv; srv = srv->tracknext) if (!(srv->state & SRV_MAINTAIN)) /* Only notify tracking servers that are not already in maintenance. */ set_server_down(&srv->check); But that's not exactly enough because srv->check->server is only set when checks are enabled, so ->server is NULL for tracking servers, still causing a crash upon first iteration. The fix is easy and consists in always initializing check->server when creating a new server, which is what was already done a few patches later by 69d29f9 (MEDIUM: cfgparse: Factor out check initialisation). With the fix above alone on top of current version or snapshot 20131122, the problem disappears. Thanks to Igor for testing and reporting the issue. --- src/checks.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/src/checks.c b/src/checks.c index cfbb8a3..a722c12 100644 --- a/src/checks.c +++ b/src/checks.c @@ -445,10 +445,10 @@ void set_server_down(struct check *check) s->counters.down_trans++; if (s->state & SRV_CHECKED) - for(srv = s->tracknext; srv; srv = srv->tracknext) - if (! (srv->state & SRV_MAINTAIN)) + for (srv = s->tracknext; srv; srv = srv->tracknext) + if (!(srv->state & SRV_MAINTAIN)) /* Only notify tracking servers that are not already in maintenance. */ - set_server_down(check); + set_server_down(&srv->check); } check->health = 0; /* failure */ @@ -520,10 +520,10 @@ void set_server_up(struct check *check) { send_log(s->proxy, LOG_NOTICE, "%s.\n", trash.str); if (s->state & SRV_CHECKED) - for(srv = s->tracknext; srv; srv = srv->tracknext) - if (! (srv->state & SRV_MAINTAIN)) + for (srv = s->tracknext; srv; srv = srv->tracknext) + if (!(srv->state & SRV_MAINTAIN)) /* Only notify tracking servers if they're not in maintenance. */ - set_server_up(check); + set_server_up(&srv->check); } if (check->health >= s->rise) -- 1.7.12.1

