Thanks for the fast and helpful replies, Willy. I hadn't realized that a request was connected to a server even before the server had responded successfully. This all makes sense now. I'll try setting "option redispatch". I assume that that will solve my problems. In that case I won't have any need to force an early redispatch if the server state changes, though I guess it would make things slightly faster.
On Fri, May 14, 2010 at 2:30 PM, Willy Tarreau <[email protected]> wrote: > Hi Malcolm, > > On Fri, May 14, 2010 at 12:07:53PM -0700, Malcolm Handley wrote: > > Hi, everyone. > > > > I'm having some trouble with the routing of requests to servers within a > > backend. > > > > Firstly, although I have "retries 3" in the defaults section of my config > > file I'm not seeing any evidence of retries. If a server is down but has > not > > been detected as down by haproxy then a request may still get sent to it > and > > a failure returned to the client. (This is the first bold line in the log > > below.) > > Yes, this is expected if your "retries" value is not large enough to cover > the time to detect that the server is down. Also, the retries are only > performed on the same server. If you want the request to be redispatched > to another server after the last attempt, you should use "option > redispatch". > > > Second, occasionally haproxy seems to route a request to a server that it > > knows is down. (This is the second bolded section below.) > > No, if you look more closely, you'll see that the request was received at > 04:03:20.475, *before* the server was marked down (04:03:21), and failed > last attempt at 04:03:23. Since the request did not switch to another > server > on the last retry, I think you did not have "option redispatch" enabled. > > > I could understand both of these if I were using cookies for routing and > had > > not enabled redispatching. But I'm using "balance leastconn" with no > mention > > of cookies in the config file. What else might I be doing that would > force > > haproxy to use a downed backend and not retry requests? > > Well, be careful, retries are always performed on the same server, except > the > last one which can be redispatched. I will study if we could force an early > redispatch in case the server changes state during retries, but there's > nothing > certain in this area. > > Regards, > Willy > > ----- > > May 14 04:03:03 prod_lb0 haproxy[28398]: 67.112.125.46:61758 > > [14/May/2010:04:02:44.517] > > ws_in ws_in/<NOSRV> -1/-1/-1/-1/18609 400 187 - - CR-- 7/7/0/0/0 0/0 > > "<BADREQ>" > > May 14 04:03:17 prod_lb0 haproxy[28398]: *127.0.0.1:58921 > > [14/May/2010:04:03:08.486] > > ws_in lists_ws/ws_2 51/0/0/-1/9049 502 204 - - SH-- 7/7/2/1/0 0/0 "GET > > /-/ping HTTP/1.1"* > > May 14 04:03:17 prod_lb0 haproxy[28074]: 67.112.125.46:59661 > > [14/May/2010:03:04:22.220] > > ws_in lists_ws/ws_2 55/0/0/31/3535558 101 27943453 - - ---- 2/2/2/2/0 0/0 > > "GET /app/etherlist/socket?session_id=9680378170&profiler=1 HTTP/1.1" > > May 14 04:03:17 prod_lb0 haproxy[28398]: 99.66.213.198:58624 > > [14/May/2010:03:52:18.455] > > ws_in lists_ws/ws_2 455/0/0/2/659334 101 3348145 - - ---- 6/6/1/0/0 0/0 > "GET > > /app/etherlist/socket?session_id=9385884222&profiler=1 HTTP/1.1" > > May 14 04:03:17 prod_lb0 haproxy[28074]: 98.210.108.197:43537 > > [14/May/2010:03:04:22.477] > > ws_in lists_ws/ws_2 165/0/0/2/3535454 101 8809800 - - ---- 1/1/1/1/0 0/0 > > "GET /-/socket?session_id=9646753365 HTTP/1.1" > > May 14 04:03:18 prod_lb0 haproxy[28074]: 70.36.139.123:56872 > > [14/May/2010:03:04:22.511] > > ws_in lists_ws/ws_2 88/0/0/2/3535503 101 25165709 - - ---- 0/0/0/0/0 0/0 > > "GET /-/socket?session_id=9669242145 HTTP/1.1" > > May 14 04:03:21 prod_lb0 haproxy[28398]: *Server lists_ws/ws_2 is DOWN, > > reason: Layer4 connection problem, info: "Connection refused", check > > duration: 0ms.* > > May 14 04:03:23 prod_lb0 haproxy[28398]: *127.0.0.1:58965 > > [14/May/2010:04:03:20.475] > > ws_in lists_ws/ws_2 10/0/-1/-1/3032 503 212 - - SC-- 8/8/2/0/3 0/0 "GET > > /-/ping HTTP/1.1"* > > May 14 04:03:29 prod_lb0 haproxy[28398]: *Server lists_ws/ws_2 is UP, > > reason: Layer4 check passed, check duration: 0ms.* > > May 14 04:03:34 prod_lb0 haproxy[28398]: 99.66.213.198:59005 > > [14/May/2010:04:02:43.444] > > ws_in ws/ws_1 15/0/1/30/50569 304 296 - - cD-- 9/9/4/2/0 0/0 "GET > > /-/static/luna/browser/images/loading.gif HTTP/1.1" > ------- > >

