AW: Server marked as "DOWN (agent)" without the agent issuing "down"

Cornelius Riemenschneider Tue, 07 Oct 2014 06:19:23 -0700

Hello,

I investigated the issue a bit further.


We both use health checks and agent-checks, the health check reports the usual 
up/down/connection failed, and the agent-check provides us with a dynamic 
weight.

The problem when we enter the DOWN (agent)-state occurs when our server 
(java-based) enters a long garbage collection, which causes the server to stop 
from anything from 40sec to 5min. This is a bug itself, but has been dealt with.
This causes the JVM to not answer to connection requests (how exactly, I do not 
know), but during that time, both the health and the agent check fail, because 
they cannot connect to the server (which is expected).

The issue now seems to be that somehow, the down-state is not reset after the 
health check comes up - the LastChk column says "Checked", and by manual 
verification, the health check is back to 200, but I think haproxy might be 
stuck in the agent-down state and expect a "up" from the agent - which will 
never come, because the agent did not cause the down state initially.

Could that be a possibility?

We also do not see the issue with 1.5-dev22, which has been stable for us for 
some months.

> Do you think it would be enough if we add in the doc that the stats page also 
> reports weight 0 as "DRAIN" ?

Yep, that sounds good :)

Regards,
Cornelius Riemenschneider

--
ITscope GmbH
Ludwig-Erhard-Allee 20
76131 Karlsruhe
Email: [email protected]
https://www.itscope.com
Handelsregister: AG Mannheim, HRB 232782
Sitz der Gesellschaft: Karlsruhe
Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger

AW: Server marked as "DOWN (agent)" without the agent issuing "down"

Reply via email to