[PATCH] BUG/MEDIUM: Do not consider an agent check as failed on L7 error

Simon Horman Wed, 25 Feb 2015 18:27:34 -0800

As failure to connect to the agent check is not sufficient to mark it as
failed it stands to reason that an L7 error shouldn't either.


Without this fix if an L7 error occurs, for example of connectivity to the
agent is lost immediately after establishing a connection to it, then the
agent check will be considered to have failed and thus may end up with zero
health. Once this has occurred if the primary health check also reaches
zero health, which is likely if connectivity to the server is lost, then
the server will be marked as down and not be marked as up again until a
successful agent check occurs regardless of the success of any primary
health checks.

This behaviour is not correct as a failed agent check should never cause a
server to be marked as down or by extension continue to be marked as down.

Signed-off-by: Simon Horman <[email protected]>

---

I believe that an easy way to reproduce this problem is using a
a badly behaved agent that disconnects once a connection is established.
e.g. while true; do nc -l -p 12345 -q 0; done &
---
 src/checks.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/checks.c b/src/checks.c
index ea167de..f592ee5 100644
--- a/src/checks.c
+++ b/src/checks.c
@@ -268,7 +268,7 @@ static void set_server_check_status(struct check *check, 
short status, const cha
                 * cause the server to be marked down.
                 */
                if ((!(check->state & CHK_ST_AGENT) ||
-                   (check->status >= HCHK_STATUS_L7TOUT)) &&
+                   (check->status >= HCHK_STATUS_L57DATA)) &&
                    (check->health >= check->rise)) {
                        s->counters.failed_checks++;
                        report = 1;
-- 
2.1.4

[PATCH] BUG/MEDIUM: Do not consider an agent check as failed on L7 error

Reply via email to