In the case where agent-port is used and the agent
check is a secondary check to not mark a server as down
if the agent becomes unavailable.

In this configuration the agent should only cause a server to be marked
as down if the agent returns "fail", "stopped" or "down".

Signed-off-by: Simon Horman <[email protected]>

---

Status: pending

v7
* Manual Rebase
* Update documentation.
  It is now appropriate to document this behaviour under "agent-check"
  rather than "agent-port"
* Remove WILLDRAIN portions of this patch as they are unrelated
  to the changelog and the rest of the patch.

Notes by Willy on v6:
    While I absolutely agree with the goal described in the commit message,
    I fail to connect it to what I'm seeing in the code. I don't know if it
    is a result of the rework of the patches but for me all it does it apply
    a second DRAIN state whose purpose I don't understand. I'm seeing that
    we first set WILLDRAIN, then update the status, then add DRAIN on top of
    that, that remains unclear to me. I'm not sure what situation the 4
    combinations of these flags correspond to.

v4
* Never update health of agent on failure to connect

  Previously the health was recharged on such cases.
  But this causes the health to be recharged even
  if the most recent result from the agent was "fail", "stopped", or "down".

  The result was unnecessary logging of "fail", "stopped", or "down"
  results from the agent if they were interleaved with timeouts.
  Or in other words, if the agent responded only occasionally
  but consistently with "fail", "stopped", or "down".

v3
* No change

v2
* Only log agent server status if the change in status
  will affect the server's state
---
 doc/configuration.txt |  4 ++++
 src/checks.c          | 15 +++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 25fbf8f..8094bab 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -7700,6 +7700,10 @@ agent-check
 
     This currently has the same behaviour as "down".
 
+  Failure to connect to the agent is not considered an error as connectivity
+  is tested by the regular health check which is enabled by the "check"
+  parameter.
+
   Requires the ""agent-port" parameter to be set.
   See also the "agent-check" parameter.
 
diff --git a/src/checks.c b/src/checks.c
index cffba02..4ab29e5 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -228,6 +228,12 @@ static void set_server_check_status(struct check *check, 
short status, const cha
                tv_zero(&check->start);
        }
 
+       /* Failure to connect to the agent as a secondary check should not
+        * cause the server to be marked down. So only log status changes
+        * for HCHK_STATUS_* statuses */
+       if (check == &s->agent && check->status < HCHK_STATUS_L7TOUT)
+               return;
+
        if (s->proxy->options2 & PR_O2_LOGHCHKS &&
        (((check->health != 0) && (check->result & SRV_CHK_FAILED)) ||
            ((check->health != s->rise + s->fall - 1) && (check->result & 
SRV_CHK_PASSED)) ||
@@ -608,6 +614,15 @@ static void check_failed(struct check *check)
 {
        struct server *s = check->server;
 
+       /* The agent secondary check should only cause a server to be marked
+        * as down if check->status is HCHK_STATUS_L7STS, which indicates
+        * that the agent returned "fail", "stopped" or "down".
+        * The implication here is that failure to connect to the agent
+        * as a secondary check should not cause the server to be marked
+        * down. */
+       if (check == &s->agent && check->status != HCHK_STATUS_L7STS)
+               return;
+
        if (check->health > s->rise) {
                check->health--; /* still good */
                s->counters.failed_checks++;
-- 
1.8.4


Reply via email to