Hi,
I was able to reproduce the health check bug and create a small patch.
Before patch:
- Health check for server servers/server1 succeeded, reason: External
check passed, code: 0, check duration: 6ms, status: 1/3 DOWN.
- Health check for server servers/server1 succeeded, reason: External
check passed, code: 0, check duration: 4ms, status: 2/3 DOWN.
- Health check for server servers/server1 failed, reason: External
check
error, code: 1, check duration: 4ms, status: 2/3 DOWN.
- Health check for server servers/server1 succeeded, reason: External
check passed, code: 0, check duration: 3ms, status: 1/1 UP.
After patch:
- Health check for server servers/server1 succeeded, reason: External
check passed, code: 0, check duration: 3ms, status: 1/3 DOWN.
- Health check for server servers/server1 succeeded, reason: External
check passed, code: 0, check duration: 4ms, status: 2/3 DOWN.
- Health check for server servers/server1 failed, reason: External
check
error, code: 1, check duration: 4ms, status: 0/3 DOWN.
- Health check for server servers/server1 succeeded, reason: External
check passed, code: 0, check duration: 4ms, status: 1/3 DOWN.
- Health check for server servers/server1 succeeded, reason: External
check passed, code: 0, check duration: 5ms, status: 2/3 DOWN.
- Health check for server servers/server1 succeeded, reason: External
check passed, code: 0, check duration: 4ms, status: 1/1 UP.
here's the patch:
--- src/checks.c 2015-12-29 09:53:50.971741769 +0100
+++ checks.c 2015-12-29 09:54:19.211743665 +0100
@@ -272,11 +272,12 @@
* cause the server to be marked down.
*/
if ((!(check->state & CHK_ST_AGENT) ||
- (check->status >= HCHK_STATUS_L57DATA)) &&
- (check->health >= check->rise)) {
- s->counters.failed_checks++;
- report = 1;
- check->health--;
+ (check->status >= HCHK_STATUS_L57DATA))) {
+ if (check->health >= check->rise) {
+ s->counters.failed_checks++;
+ report = 1;
+ check->health--;
+ }
if (check->health < check->rise)
check->health = 0;
}