Allow an auxiliary agent check to be run independently of the
regular a regular health check. This is enabled by the agent-check
server setting.

The agent-port, which specifies the TCP port to use for the agent's
connections, is required.

The agent-inter, which specifies the interval between agent checks and
timeout of agent checks, is optional. If not set the value for regular
checks is used.

e.g.
server  web1_1 127.0.0.1:80 check agent-port 10000

If either the health or agent check determines that a server is down
then it is marked as being down, otherwise it is marked as being up.

An agent health check performed by opening a TCP socket and reading an
ASCII string. The string should have one of the following forms:

* An ASCII representation of an positive integer percentage.
  e.g. "75%"

  Values in this format will set the weight proportional to the initial
  weight of a server as configured when haproxy starts.

* The string "drain".

  This will cause the weight of a server to be set to 0, and thus it
  will not accept any new connections other than those that are
  accepted via persistence.

* The string "down", optionally followed by a description string.

  Mark the server as down and log the description string as the reason.

* The string "stopped", optionally followed by a description string.

  This currently has the same behaviour as "down".

* The string "fail", optionally followed by a description string.

  This currently has the same behaviour as "down".

Signed-off-by: Simon Horman <ho...@verge.net.au>

---

v7
* Manual Rebase
* Add "agent-check" keyword which is used to enable agent checks
  explicitly. This allows the "agent-port" setting to be used in
  the default-server without enabling agent checks for all servers.
* Enhance documentation to note compatibility of options
  with default-server
* Enhance changelog to try and make it more obvious what the
  patch does.

* Note: lb-agent-chk has been removed by a previous patch

Notes by Willy on v6:
    The commit message confused me but fortunately the doc fixed me :-)

    I'm having an issue with using agent-port to automatically enable the
    check. We've done this mistake with the stats in the past. Specifying
    any stats setting enables the stats. The end result is that nobody
    sets the stats URL or passwords in a defaults section.

    Here we'll get the same issue. On large deployments, it's very likely
    that users will want to have "default-server agent-port 10000" in their
    defaults section, but then they will have no way to disable it for some
    servers.

    Thus I'd rather stick to the same logic we have for current health checks
    which consists in having the port being an independant and harmless setting,
    and have an "agent-check" keyword on the server lines to enable it where
    relevant.

    Please do not forget to mention the compatibility between agent-port and
    default-server in the doc.

    Also, just a stupid question, we've introduced option lb-agent-chk very
    recently and we're still in development. Don't you think it would be
    useful to remove it now before a release if agent-check can cover all
    its needs ? It would avoid having to deal with complex setups in the
    future. CCing the lb.org guys here who most likely use it already (and
    might probably plan a config migration).

v5
* Rebase for removal of server argument from init_check
* Rebase for setting of type in init_check

v4
*  Increment global.maxsock for agent-port.

   If agent-port is configured then an extra socket is required.

* Do not send requests to secondary agent checks

  The request configuration of a proxy relates to the primary health
  check and should not be sent to the secondary health check if it
  is in operation.

* Correct usage of PR_O2_LB_AGENT_CHK

  The correct way to check for PR_O2_LB_AGENT_CHK is
  not to use x & PR_O2_LB_AGENT_CHK, but rather to use
  (x & PR_O2_CHK_ANY ) == PR_O2_LB_AGENT_CHK.

v2 - v3
* No change
---
 doc/configuration.txt  | 81 +++++++++++++++++++++++++++++++++++++++++++-------
 include/types/server.h |  4 +++
 src/cfgparse.c         | 72 ++++++++++++++++++++++++++++++++++++++++++--
 src/checks.c           | 30 +++++++++++++------
 src/haproxy.c          |  6 ++++
 5 files changed, 171 insertions(+), 22 deletions(-)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 75b77c3..25fbf8f 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -777,11 +777,12 @@ nosplice
   "option splice-response".
 
 spread-checks <0..50, in percent>
-  Sometimes it is desirable to avoid sending health checks to servers at exact
-  intervals, for instance when many logical servers are located on the same
-  physical server. With the help of this parameter, it becomes possible to add
-  some randomness in the check interval between 0 and +/- 50%. A value between
-  2 and 5 seems to show good results. The default value remains at 0.
+  Sometimes it is desirable to avoid sending agent and health checks to
+  servers at exact intervals, for instance when many logical servers are
+  located on the same physical server. With the help of this parameter, it
+  becomes possible to add some randomness in the check interval between 0
+  and +/- 50%. A value between 2 and 5 seems to show good results. The
+  default value remains at 0.
 
 tune.bufsize <number>
   Sets the buffer size to this size (in bytes). Lower values allow more
@@ -7669,6 +7670,66 @@ addr <ipv4|ipv6>
 
   Supported in default-server: No
 
+agent-check
+  Enable an auxiliary agent check which is run independently of a regular
+  health check. An agent health check is performed by making a TCP
+  connection to the port set by the "agent-port" parameter" and reading
+  an ASCII string. The string should have one of the following forms:
+
+  * An ASCII representation of an positive integer percentage.
+    e.g. "75%"
+
+    Values in this format will set the weight proportional to the initial
+    weight of a server as configured when haproxy starts.
+
+  * The string "drain".
+
+    This will cause the weight of a server to be set to 0, and thus it will
+    not accept any new connections other than those that are accepted via
+    persistence.
+
+  * The string "down", optionally followed by a description string.
+
+    Mark the server as down and log the description string as the reason.
+
+  * The string "stopped", optionally followed by a description string.
+
+    This currently has the same behaviour as "down".
+
+  * The string "fail", optionally followed by a description string.
+
+    This currently has the same behaviour as "down".
+
+  Requires the ""agent-port" parameter to be set.
+  See also the "agent-check" parameter.
+
+  Supported in default-server: No
+
+agent-inter <delay>
+  The "agent-inter" parameter sets the interval between two agent checks
+  to <delay> milliseconds. If left unspecified, the delay defaults to 2000 ms.
+
+  Just as with every other time-based parameter, it may be entered in any
+  other explicit unit among { us, ms, s, m, h, d }. The "agent-inter"
+  parameter also serves as a timeout for agent checks "timeout check" is
+  not set. In order to reduce "resonance" effects when multiple servers are
+  hosted on the same hardware, the agent and health checks of all servers
+  are started with a small time offset between them. It is also possible to
+  add some random noise in the agent and health checks interval using the
+  global "spread-checks" keyword. This makes sense for instance when a lot
+  of backends use the same servers.
+
+  See also the "agent-check" and "agent-port" parameters.
+
+  Supported in default-server: Yes
+
+agent-port <port>
+  The "agent-port" parameter sets the TCP port used for agent checks.
+
+  See also the "agent-check" and "agent-inter" parameters.
+
+  Supported in default-server: Yes
+
 backup
   When "backup" is present on a server line, the server is only used in load
   balancing when all other non-backup servers are unavailable. Requests coming
@@ -7844,11 +7905,11 @@ downinter <delay>
   other explicit unit among { us, ms, s, m, h, d }. The "inter" parameter also
   serves as a timeout for health checks sent to servers if "timeout check" is
   not set. In order to reduce "resonance" effects when multiple servers are
-  hosted on the same hardware, the health-checks of all servers are started
-  with a small time offset between them. It is also possible to add some random
-  noise in the health checks interval using the global "spread-checks"
-  keyword. This makes sense for instance when a lot of backends use the same
-  servers.
+  hosted on the same hardware, the agent and health checks of all servers
+  are started with a small time offset between them. It is also possible to
+  add some random noise in the agent and health checks interval using the
+  global "spread-checks" keyword. This makes sense for instance when a lot
+  of backends use the same servers.
 
   Supported in default-server: Yes
 
diff --git a/include/types/server.h b/include/types/server.h
index 1df56e9..73d426d 100644
--- a/include/types/server.h
+++ b/include/types/server.h
@@ -55,6 +55,9 @@
 /* unused: 0x0100, 0x0200, 0x0400 */
 #define SRV_SEND_PROXY 0x0800  /* this server talks the PROXY protocol */
 #define SRV_NON_STICK  0x1000  /* never add connections allocated to this 
server to a stick table */
+#define SRV_AGENT_CHECKED  0x2000  /* this server needs to be checked using an 
agent check.
+                                   * This is run independently of the main 
check whose
+                                   * presence is indicated by the SRV_CHECKED 
flag */
 
 /* function which act on servers need to return various errors */
 #define SRV_STATUS_OK       0   /* everything is OK. */
@@ -190,6 +193,7 @@ struct server {
        } check_common;
 
        struct check check;                     /* health-check specific 
configuration */
+       struct check agent;                     /* agent specific configuration 
*/
 
 #ifdef USE_OPENSSL
        int use_ssl;                            /* ssl enabled */
diff --git a/src/cfgparse.c b/src/cfgparse.c
index 724b434..7df7de0 100644
--- a/src/cfgparse.c
+++ b/src/cfgparse.c
@@ -1325,9 +1325,13 @@ void init_default_instance()
        defproxy.defsrv.check.inter = DEF_CHKINTR;
        defproxy.defsrv.check.fastinter = 0;
        defproxy.defsrv.check.downinter = 0;
+       defproxy.defsrv.agent.inter = DEF_CHKINTR;
+       defproxy.defsrv.agent.fastinter = 0;
+       defproxy.defsrv.agent.downinter = 0;
        defproxy.defsrv.rise = DEF_RISETIME;
        defproxy.defsrv.fall = DEF_FALLTIME;
        defproxy.defsrv.check.port = 0;
+       defproxy.defsrv.agent.port = 0;
        defproxy.defsrv.maxqueue = 0;
        defproxy.defsrv.minconn = 0;
        defproxy.defsrv.maxconn = 0;
@@ -4172,7 +4176,7 @@ stats_error_parsing:
        else if (!strcmp(args[0], "server") || !strcmp(args[0], 
"default-server")) {  /* server address */
                int cur_arg;
                short realport = 0;
-               int do_check = 0, defsrv = (*args[0] == 'd');
+               int do_agent = 0, do_check = 0, defsrv = (*args[0] == 'd');
 
                if (!defsrv && curproxy == &defproxy) {
                        Alert("parsing [%s:%d] : '%s' not allowed in 'defaults' 
section.\n", file, linenum, args[0]);
@@ -4219,6 +4223,7 @@ stats_error_parsing:
                        LIST_INIT(&newsrv->actconns);
                        LIST_INIT(&newsrv->pendconns);
                        do_check = 0;
+                       do_agent = 0;
                        newsrv->state = SRV_RUNNING; /* early server setup */
                        newsrv->last_change = now.tv_sec;
                        newsrv->id = strdup(args[1]);
@@ -4272,11 +4277,16 @@ stats_error_parsing:
                                goto out;
                        }
 
-                       newsrv->check.use_ssl = curproxy->defsrv.check.use_ssl;
+                       newsrv->check.use_ssl   = 
curproxy->defsrv.check.use_ssl;
                        newsrv->check.port      = curproxy->defsrv.check.port;
                        newsrv->check.inter     = curproxy->defsrv.check.inter;
                        newsrv->check.fastinter = 
curproxy->defsrv.check.fastinter;
                        newsrv->check.downinter = 
curproxy->defsrv.check.downinter;
+                       newsrv->agent.use_ssl   = 
curproxy->defsrv.agent.use_ssl;
+                       newsrv->agent.port      = curproxy->defsrv.agent.port;
+                       newsrv->agent.inter     = curproxy->defsrv.agent.inter;
+                       newsrv->agent.fastinter = 
curproxy->defsrv.agent.fastinter;
+                       newsrv->agent.downinter = 
curproxy->defsrv.agent.downinter;
                        newsrv->rise            = curproxy->defsrv.rise;
                        newsrv->fall            = curproxy->defsrv.fall;
                        newsrv->maxqueue        = curproxy->defsrv.maxqueue;
@@ -4296,6 +4306,10 @@ stats_error_parsing:
                        newsrv->check.health    = newsrv->rise; /* up, but will 
fall down at first failure */
                        newsrv->check.server    = newsrv;
 
+                       newsrv->agent.status    = HCHK_STATUS_INI;
+                       newsrv->agent.health    = newsrv->rise; /* up, but will 
fall down at first failure */
+                       newsrv->agent.server    = newsrv;
+
                        cur_arg = 3;
                } else {
                        newsrv = &curproxy->defsrv;
@@ -4303,7 +4317,33 @@ stats_error_parsing:
                }
 
                while (*args[cur_arg]) {
-                       if (!defsrv && !strcmp(args[cur_arg], "cookie")) {
+                       if (!strcmp(args[cur_arg], "agent-check")) {
+                               global.maxsock++;
+                               do_agent = 1;
+                               cur_arg += 1;
+                       } else if (!strcmp(args[cur_arg], "agent-inter")) {
+                               const char *err = parse_time_err(args[cur_arg + 
1], &val, TIME_UNIT_MS);
+                               if (err) {
+                                       Alert("parsing [%s:%d] : unexpected 
character '%c' in 'agent-inter' argument of server %s.\n",
+                                             file, linenum, *err, newsrv->id);
+                                       err_code |= ERR_ALERT | ERR_FATAL;
+                                       goto out;
+                               }
+                               if (val <= 0) {
+                                       Alert("parsing [%s:%d]: invalid value 
%d for argument '%s' of server %s.\n",
+                                             file, linenum, val, 
args[cur_arg], newsrv->id);
+                                       err_code |= ERR_ALERT | ERR_FATAL;
+                                       goto out;
+                               }
+                               newsrv->agent.inter = val;
+                               cur_arg += 2;
+                       }
+                       else if (!strcmp(args[cur_arg], "agent-port")) {
+                               global.maxsock++;
+                               newsrv->agent.port = atol(args[cur_arg + 1]);
+                               cur_arg += 2;
+                       }
+                       else if (!defsrv && !strcmp(args[cur_arg], "cookie")) {
                                newsrv->cookie = strdup(args[cur_arg + 1]);
                                newsrv->cklen = strlen(args[cur_arg + 1]);
                                cur_arg += 2;
@@ -4331,6 +4371,8 @@ stats_error_parsing:
 
                                if (newsrv->check.health)
                                        newsrv->check.health = newsrv->rise;
+                               if (newsrv->agent.health)
+                                       newsrv->agent.health = newsrv->rise;
                                cur_arg += 2;
                        }
                        else if (!strcmp(args[cur_arg], "fall")) {
@@ -4512,6 +4554,7 @@ stats_error_parsing:
                                newsrv->state |= SRV_MAINTAIN;
                                newsrv->state &= ~SRV_RUNNING;
                                newsrv->check.health = 0;
+                               newsrv->agent.health = 0;
                                cur_arg += 1;
                        }
                        else if (!defsrv && !strcmp(args[cur_arg], "observe")) {
@@ -4913,6 +4956,28 @@ stats_error_parsing:
                        newsrv->state |= SRV_CHECKED;
                }
 
+               if (do_agent) {
+                       int ret;
+
+                       if (!newsrv->agent.port) {
+                               Alert("parsing [%s:%d] : server %s does not 
have agent port. Agent check has been disabled.\n",
+                                     file, linenum, newsrv->id);
+                               err_code |= ERR_ALERT | ERR_FATAL;
+                               goto out;
+                       }
+
+                       if (!newsrv->agent.inter)
+                               newsrv->agent.inter = newsrv->check.inter;
+
+                       ret = init_check(&newsrv->agent, PR_O2_LB_AGENT_CHK, 
file, linenum);
+                       if (ret) {
+                               err_code |= ret;
+                               goto out;
+                       }
+
+                       newsrv->state |= SRV_AGENT_CHECKED;
+               }
+
                if (!defsrv) {
                        if (newsrv->state & SRV_BACKUP)
                                curproxy->srv_bck++;
@@ -6802,6 +6867,7 @@ out_uri_auth_compat:
                                        newsrv->state |= SRV_MAINTAIN;
                                        newsrv->state &= ~SRV_RUNNING;
                                        newsrv->check.health = 0;
+                                       newsrv->agent.health = 0;
                                }
 
                                newsrv->track = srv;
diff --git a/src/checks.c b/src/checks.c
index d865c0b..cffba02 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -398,7 +398,7 @@ void set_server_down(struct check *check)
                check->health = s->rise;
        }
 
-       if (check->health == s->rise || s->track) {
+       if ((s->state & SRV_RUNNING && check->health == s->rise) || s->track) {
                int srv_was_paused = s->state & SRV_GOINGDOWN;
                int prev_srv_count = s->proxy->srv_bck + s->proxy->srv_act;
 
@@ -465,7 +465,8 @@ void set_server_up(struct check *check) {
                check->health = s->rise;
        }
 
-       if (check->health == s->rise || s->track) {
+       if ((s->check.health >= s->rise && s->agent.health >= s->rise &&
+            check->health == s->rise) || s->track) {
                if (s->proxy->srv_bck == 0 && s->proxy->srv_act == 0) {
                        if (s->proxy->last_change < now.tv_sec)         // 
ignore negative times
                                s->proxy->down_time += now.tv_sec - 
s->proxy->last_change;
@@ -1314,8 +1315,11 @@ static struct task *process_chk(struct task *t)
                check->bo->p = check->bo->data;
                check->bo->o = 0;
 
-               /* prepare the check buffer */
-               if (check->type) {
+              /* prepare the check buffer
+               * This should not be used if check is the secondary agent check
+               * of a server as s->proxy->check_req will relate to the
+               * configuration of the primary check */
+              if (check->type && check != &s->agent) {
                        bo_putblk(check->bo, s->proxy->check_req, 
s->proxy->check_len);
 
                        /* we want to check if this host replies to HTTP or 
SSLv3 requests
@@ -1584,12 +1588,20 @@ int start_checks() {
         */
        for (px = proxy; px; px = px->next) {
                for (s = px->srv; s; s = s->next) {
-                       if (!(s->state & SRV_CHECKED))
-                               continue;
+                       /* A task for the main check */
+                       if (s->state & SRV_CHECKED) {
+                               if (!start_check_task(&s->check, mininter, 
nbcheck, srvpos))
+                                       return -1;
+                               srvpos++;
+                       }
 
-                       if (!start_check_task(&s->check, mininter, nbcheck, 
srvpos))
-                               return -1;
-                       srvpos++;
+                       /* A task for a auxiliary agent check */
+                       if (s->state & SRV_AGENT_CHECKED) {
+                               if (!start_check_task(&s->agent, mininter, 
nbcheck, srvpos)) {
+                                       return -1;
+                               }
+                               srvpos++;
+                       }
                }
        }
        return 0;
diff --git a/src/haproxy.c b/src/haproxy.c
index bc03a73..e03219a 100644
--- a/src/haproxy.c
+++ b/src/haproxy.c
@@ -1120,6 +1120,10 @@ void deinit(void)
                                task_delete(s->check.task);
                                task_free(s->check.task);
                        }
+                       if (s->agent.task) {
+                               task_delete(s->agent.task);
+                               task_free(s->agent.task);
+                       }
 
                        if (s->warmup) {
                                task_delete(s->warmup);
@@ -1130,6 +1134,8 @@ void deinit(void)
                        free(s->cookie);
                        free(s->check.bi);
                        free(s->check.bo);
+                       free(s->agent.bi);
+                       free(s->agent.bo);
                        free(s);
                        s = s_next;
                }/* end while(s) */
-- 
1.8.4


Reply via email to