[PATCH] Re: Health check and flapping

contact Tue, 29 Dec 2015 01:25:00 -0800

Hi,

I was able to reproduce the health check bug and create a small patch.


Before patch:

- Health check for server servers/server1 succeeded, reason: Externalcheck passed, code: 0, check duration: 6ms, status: 1/3 DOWN.- Health check for server servers/server1 succeeded, reason: Externalcheck passed, code: 0, check duration: 4ms, status: 2/3 DOWN.- Health check for server servers/server1 failed, reason: External checkerror, code: 1, check duration: 4ms, status: 2/3 DOWN.- Health check for server servers/server1 succeeded, reason: Externalcheck passed, code: 0, check duration: 3ms, status: 1/1 UP.


After patch:

- Health check for server servers/server1 succeeded, reason: Externalcheck passed, code: 0, check duration: 3ms, status: 1/3 DOWN.- Health check for server servers/server1 succeeded, reason: Externalcheck passed, code: 0, check duration: 4ms, status: 2/3 DOWN.- Health check for server servers/server1 failed, reason: External checkerror, code: 1, check duration: 4ms, status: 0/3 DOWN.- Health check for server servers/server1 succeeded, reason: Externalcheck passed, code: 0, check duration: 4ms, status: 1/3 DOWN.- Health check for server servers/server1 succeeded, reason: Externalcheck passed, code: 0, check duration: 5ms, status: 2/3 DOWN.- Health check for server servers/server1 succeeded, reason: Externalcheck passed, code: 0, check duration: 4ms, status: 1/1 UP.


here's the patch:
--- src/checks.c        2015-12-29 09:53:50.971741769 +0100
+++ checks.c    2015-12-29 09:54:19.211743665 +0100
@@ -272,11 +272,12 @@
                 * cause the server to be marked down.
                 */
                if ((!(check->state & CHK_ST_AGENT) ||
-                   (check->status >= HCHK_STATUS_L57DATA)) &&
-                   (check->health >= check->rise)) {
-                       s->counters.failed_checks++;
-                       report = 1;
-                       check->health--;
+                   (check->status >= HCHK_STATUS_L57DATA))) {
+                       if (check->health >= check->rise) {
+                               s->counters.failed_checks++;
+                               report = 1;
+                               check->health--;
+                       }
                        if (check->health < check->rise)
                                check->health = 0;
                }
--

Pierre Zemb

Hi,

I've not forgotten you, I'm just running out of time.

Baptiste


On Tue, Sep 29, 2015 at 5:43 PM,  <[email protected]> wrote:

Le 2015-08-28 16:40, Baptiste a =C3=A9crit :


Le 28 ao=C3=BBt 2015 15:45, <[email protected]> a =C3=A9crit :
 >
 > Hello,
 >
 > We have tcp-check configured on some backends, which works fine,
except when service is flapping.
 >
 > If the backend server is in transitional state, for example
transitionally DOWN (going up), the counter is not reset to 0 if
tcp-check give a KO state between some OK state. The result is that if
the service is flapping, backend become up for a few seconds quite
often, even if all OK state are not consecutives.
 >
 > Example of sequence with rise 3:
 >
 > KO -> 0/3
 > KO -> 0/3
 > OK -> 1/3
 > KO -> 1/3 <- should back to 0/3
 > KO -> 1/3
 > KO -> 1/3
 > OK -> 2/3
 > KO -> 2/3
 > KO -> 2/3
 > OK -> 3/3 -> Server UP
 >
 > Is there a way to configure the counter to reset itself in case of
flapping ?
 >
 > Thanks.

Hi there,

Thanks for reporting this behavior.

I'll have a look and come back to you.

Baptiste



Hello,

Are you able to reproduce on your side ?

Thanks

[PATCH] Re: Health check and flapping

Reply via email to