Hello, I investigated the issue a bit further.
We both use health checks and agent-checks, the health check reports the usual up/down/connection failed, and the agent-check provides us with a dynamic weight. The problem when we enter the DOWN (agent)-state occurs when our server (java-based) enters a long garbage collection, which causes the server to stop from anything from 40sec to 5min. This is a bug itself, but has been dealt with. This causes the JVM to not answer to connection requests (how exactly, I do not know), but during that time, both the health and the agent check fail, because they cannot connect to the server (which is expected). The issue now seems to be that somehow, the down-state is not reset after the health check comes up - the LastChk column says "Checked", and by manual verification, the health check is back to 200, but I think haproxy might be stuck in the agent-down state and expect a "up" from the agent - which will never come, because the agent did not cause the down state initially. Could that be a possibility? We also do not see the issue with 1.5-dev22, which has been stable for us for some months. > Do you think it would be enough if we add in the doc that the stats page also > reports weight 0 as "DRAIN" ? Yep, that sounds good :) Regards, Cornelius Riemenschneider -- ITscope GmbH Ludwig-Erhard-Allee 20 76131 Karlsruhe Email: [email protected] https://www.itscope.com Handelsregister: AG Mannheim, HRB 232782 Sitz der Gesellschaft: Karlsruhe Geschäftsführer: Alexander Münkel, Benjamin Mund, Stefan Reger

