I've been monitoring our service availability check (http head of a resource that truly provides availability status of the application). Under normal circumstances, the check takes 2-3 seconds. We found periods of time where the application would take 15+seconds and fail (I did not capture HTTP code, but I'm pretty sure it was a 500 series from what I've been looking through). These failure periods match the times where haproxy was indicating timeouts of 1002ms. So, it looks like haproxy is doing its job. Is this then a bug in the logging of the timeout value (reporting 1002ms vs 15000+ms)?

We haven't had any problems since 25 May, but we're keeping watch.

- Kevin

On 5/25/12 11:18 AM, Lange, Kevin M. (GSFC-423.0)[RAYTHEON COMPANY] wrote:
Willy,
I'll try the patch, but not until next week because of the holiday weekend. I don't want to make a significant change that I would have to support over the long weekend. I'm capturing tcpdump between SLB and the three backends. I'd like to have a capture during an "outage". I expect to see something today, and I'll send to you.
- Kevin


On May 25, 2012, at 2:12 AM, Willy Tarreau wrote:

Hi again Kevin,

Well, I suspect that there might be a corner case with the bug I fixed
which might have caused what you observed.

The "timeout connect" is computed from the last expire date. Since
"timeout check" was added upon connection establishment but the task
was woken too late, then that after a first check failure reported
too late, you can have the next check timeout shortened.

It's still unclear to me how it is possible that the check timeout is
reported this small, considering that it's updated once the connect
succeeds. But performing computations in the past is never a good way
to have something reliable.

Could you please apply the attached fix for the bug I mentionned in
previous mail, to see if the issue is still present ? After all, I
would not be totally surprized if this bug has nasty side effects
like this.

Thanks,
Willy

<0001-BUG-MINOR-checks-expire-on-timeout.check-if-smaller-.patch>

Kevin Lange
[email protected] <mailto:[email protected]>
[email protected] <mailto:[email protected]>
W: +1 (301) 851-8450
Raytheon  | NASA  | ECS Evolution Development Program
https://www.echo.com  | https://www.raytheon.com


Reply via email to