I thought it was a bug in the reporting, considering we've played with numerous 
values for the various timeouts as an experiment, but wanted your thoughts.
This is v1.4.15.

 [root@opsslb1 log]# haproxy -v
HA-Proxy version 1.4.15 2011/04/08
Copyright 2000-2010 Willy Tarreau <[email protected]>

On May 24, 2012, at 5:17 PM, Willy Tarreau wrote:

> Hi Kevin,
> 
> On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. 
> (GSFC-423.0)[RAYTHEON COMPANY] wrote:
>> Hi,
>> We're having odd behavior (apparently have always but didn't realize it), 
>> where our backend httpchks "time out":
>> 
>> May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 2 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>> servers left. 3 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 1 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is 
>> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup 
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is 
>> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup 
>> servers left. 0 sessions active, 0 requeued, 0 remaining in queue.
>> 
>> 
>> We've been playing with the timeout values, and we don't know what is 
>> controlling the "Layer7 timeout, check duration: 1002ms".  The backend 
>> service availability check (by hand) typically takes 2-3 seconds on average.
>> Here is the relevant haproxy setup.
>> 
>> #---------------------------------------------------------------------
>> # Global settings
>> #---------------------------------------------------------------------
>> global
>>    log-send-hostname opsslb1
>>    log         127.0.0.1 local1 info
>> #    chroot      /var/lib/haproxy
>>    pidfile     /var/run/haproxy.pid
>>    maxconn     1024
>>    user        haproxy
>>    group       haproxy
>>    daemon
>> 
>> #---------------------------------------------------------------------
>> # common defaults that all the 'listen' and 'backend' sections will
>> # use if not designated in their block
>> #---------------------------------------------------------------------
>> defaults
>>    mode        http
>>    log         global
>>    option      dontlognull
>>    option      httpclose
>>    option      httplog
>>    option      forwardfor
>>    option      redispatch
>>    timeout connect 500 # default 10 second time out if a backend is not found
>>    timeout client 50000
>>    timeout server 3600000
>>    maxconn     60000
>>    retries     3
>> 
>> frontend webapp_ops_ft
>> 
>>        bind 10.0.40.209:80
>>        default_backend webapp_ops_bk
>> 
>> backend webapp_ops_bk
>>        balance roundrobin
>>        option httpchk HEAD /app/availability
>>        reqrep ^Host:.* Host:\ webapp.example.com
>>        server webapp_ops1 opsapp1.ops.example.com:41000 check inter 30000
>>        server webapp_ops2 opsapp2.ops.example.com:41000 check inter 30000
>>        server webapp_ops3 opsapp3.ops.example.com:41000 check inter 30000
>>        timeout check 15000
>>        timeout connect 15000
> 
> This is quite strange. The timeout is defined first by "timeout check" or if
> unset, by "inter". So in your case you should observe a 15sec timeout, not
> one second.
> 
> What exact version is this ? (haproxy -vv)
> 
> It looks like a bug, however it could be a bug in the timeout handling as
> well as in the reporting. I'd suspect the latter since you're saying that
> the service takes 2-3 sec to respond and you don't seem to see errors
> that often.
> 
> Regards,
> Willy
> 

Kevin Lange
[email protected]
[email protected]
W: +1 (301) 851-8450
Raytheon  | NASA  | ECS Evolution Development Program
https://www.echo.com  | https://www.raytheon.com

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to