I thought it was a bug in the reporting, considering we've played with numerous values for the various timeouts as an experiment, but wanted your thoughts. This is v1.4.15.
[root@opsslb1 log]# haproxy -v HA-Proxy version 1.4.15 2011/04/08 Copyright 2000-2010 Willy Tarreau <[email protected]> On May 24, 2012, at 5:17 PM, Willy Tarreau wrote: > Hi Kevin, > > On Thu, May 24, 2012 at 04:04:03PM -0500, Lange, Kevin M. > (GSFC-423.0)[RAYTHEON COMPANY] wrote: >> Hi, >> We're having odd behavior (apparently have always but didn't realize it), >> where our backend httpchks "time out": >> >> May 24 04:03:33 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is >> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup >> servers left. 1 sessions active, 0 requeued, 0 remaining in queue. >> May 24 04:41:55 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is >> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup >> servers left. 2 sessions active, 0 requeued, 0 remaining in queue. >> May 24 08:38:10 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is >> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup >> servers left. 1 sessions active, 0 requeued, 0 remaining in queue. >> May 24 08:53:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is >> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup >> servers left. 0 sessions active, 0 requeued, 0 remaining in queue. >> May 24 09:32:20 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is >> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup >> servers left. 3 sessions active, 0 requeued, 0 remaining in queue. >> May 24 09:35:01 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is >> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup >> servers left. 0 sessions active, 0 requeued, 0 remaining in queue. >> May 24 09:41:37 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops2 is >> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup >> servers left. 1 sessions active, 0 requeued, 0 remaining in queue. >> May 24 09:56:41 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops3 is >> DOWN, reason: Layer7 timeout, check duration: 1002ms. 0 active and 0 backup >> servers left. 0 sessions active, 0 requeued, 0 remaining in queue. >> May 24 10:01:45 opsslb1 haproxy[4594]: Server webapp_ops_bk/webapp_ops1 is >> DOWN, reason: Layer7 timeout, check duration: 1001ms. 0 active and 0 backup >> servers left. 0 sessions active, 0 requeued, 0 remaining in queue. >> >> >> We've been playing with the timeout values, and we don't know what is >> controlling the "Layer7 timeout, check duration: 1002ms". The backend >> service availability check (by hand) typically takes 2-3 seconds on average. >> Here is the relevant haproxy setup. >> >> #--------------------------------------------------------------------- >> # Global settings >> #--------------------------------------------------------------------- >> global >> log-send-hostname opsslb1 >> log 127.0.0.1 local1 info >> # chroot /var/lib/haproxy >> pidfile /var/run/haproxy.pid >> maxconn 1024 >> user haproxy >> group haproxy >> daemon >> >> #--------------------------------------------------------------------- >> # common defaults that all the 'listen' and 'backend' sections will >> # use if not designated in their block >> #--------------------------------------------------------------------- >> defaults >> mode http >> log global >> option dontlognull >> option httpclose >> option httplog >> option forwardfor >> option redispatch >> timeout connect 500 # default 10 second time out if a backend is not found >> timeout client 50000 >> timeout server 3600000 >> maxconn 60000 >> retries 3 >> >> frontend webapp_ops_ft >> >> bind 10.0.40.209:80 >> default_backend webapp_ops_bk >> >> backend webapp_ops_bk >> balance roundrobin >> option httpchk HEAD /app/availability >> reqrep ^Host:.* Host:\ webapp.example.com >> server webapp_ops1 opsapp1.ops.example.com:41000 check inter 30000 >> server webapp_ops2 opsapp2.ops.example.com:41000 check inter 30000 >> server webapp_ops3 opsapp3.ops.example.com:41000 check inter 30000 >> timeout check 15000 >> timeout connect 15000 > > This is quite strange. The timeout is defined first by "timeout check" or if > unset, by "inter". So in your case you should observe a 15sec timeout, not > one second. > > What exact version is this ? (haproxy -vv) > > It looks like a bug, however it could be a bug in the timeout handling as > well as in the reporting. I'd suspect the latter since you're saying that > the service takes 2-3 sec to respond and you don't seem to see errors > that often. > > Regards, > Willy > Kevin Lange [email protected] [email protected] W: +1 (301) 851-8450 Raytheon | NASA | ECS Evolution Development Program https://www.echo.com | https://www.raytheon.com
smime.p7s
Description: S/MIME cryptographic signature

