Re: Strange 'Down' status on backend

Willy Tarreau Fri, 28 Dec 2012 11:26:56 -0800

Hi Reinout,

On Fri, Dec 28, 2012 at 07:46:42PM +0100, Reinout Verkerk | Trilex wrote:
> Hi Jonathan,
> 
> Thank you for your reply, but I thought about that, but I can not find any
> evidence that that's the case. If I look at my last log, I restarted
> haproxy (after upgrading to the latest dev17) at 18:20LT. The first warning
> after that is the replication check failing at 19:37. Loglevel is set to
> notice, maybe that has something to do with it?
> 
> Dec 28 18:20:03 localhost haproxy[19206]: Proxy db02_replication started.
> Dec 28 19:37:13 localhost haproxy[19207]: Server db02_replication/db02 is
> DOWN, reason: Layer7 check passed, code: 200, info: "OK", check duration:
> 25ms. 0 active and 0 backup servers left. 0 sessions active, 0 requeued, 0
> remaining in queue.


That's amazingly strange indeed. I suspect that a network error occurs
from time to time, is reported to the upper layers after the response
appears. For example, if the response is sent without reading the request,
sometimes a reset will be emitted. It's possible that haproxy sometimes
gets two consecutive recv() calls then :
  - one containing the 200 OK response
  - a reset

The reset would be translated into an error, but it's surprizing that the
contents are correctly decoded.

Is it possible to take a network capture of this health check when it
fails ? Please use "tcpdump -s0 -npi eth0 -w trace.cap" for this to
ensure that packets are not truncated, and then once you get a log,
you can filter on the interesting session.

This will greatly help understand what condition causes this issue and
then we can see how to fix or work around it.

Regards,
Willy

Re: Strange 'Down' status on backend

Reply via email to