Thanks for the fast and helpful replies, Willy.

I hadn't realized that a request was connected to a server even before the
server had responded successfully. This all makes sense now. I'll try
setting "option redispatch". I assume that that will solve my problems. In
that case I won't have any need to force an early redispatch if the server
state changes, though I guess it would make things slightly faster.

On Fri, May 14, 2010 at 2:30 PM, Willy Tarreau <[email protected]> wrote:

> Hi Malcolm,
>
> On Fri, May 14, 2010 at 12:07:53PM -0700, Malcolm Handley wrote:
> > Hi, everyone.
> >
> > I'm having some trouble with the routing of requests to servers within a
> > backend.
> >
> > Firstly, although I have "retries 3" in the defaults section of my config
> > file I'm not seeing any evidence of retries. If a server is down but has
> not
> > been detected as down by haproxy then a request may still get sent to it
> and
> > a failure returned to the client. (This is the first bold line in the log
> > below.)
>
> Yes, this is expected if your "retries" value is not large enough to cover
> the time to detect that the server is down. Also, the retries are only
> performed on the same server. If you want the request to be redispatched
> to another server after the last attempt, you should use "option
> redispatch".
>
> > Second, occasionally haproxy seems to route a request to a server that it
> > knows is down. (This is the second bolded section below.)
>
> No, if you look more closely, you'll see that the request was received at
> 04:03:20.475, *before* the server was marked down (04:03:21), and failed
> last attempt at 04:03:23. Since the request did not switch to another
> server
> on the last retry, I think you did not have "option redispatch" enabled.
>
> > I could understand both of these if I were using cookies for routing and
> had
> > not enabled redispatching. But I'm using "balance leastconn" with no
> mention
> > of cookies in the config file. What else might I be doing that would
> force
> > haproxy to use a downed backend and not retry requests?
>
> Well, be careful, retries are always performed on the same server, except
> the
> last one which can be redispatched. I will study if we could force an early
> redispatch in case the server changes state during retries, but there's
> nothing
> certain in this area.
>
> Regards,
> Willy
>
> -----
> > May 14 04:03:03 prod_lb0 haproxy[28398]: 67.112.125.46:61758
> > [14/May/2010:04:02:44.517]
> > ws_in ws_in/<NOSRV> -1/-1/-1/-1/18609 400 187 - - CR-- 7/7/0/0/0 0/0
> > "<BADREQ>"
> > May 14 04:03:17 prod_lb0 haproxy[28398]: *127.0.0.1:58921
> > [14/May/2010:04:03:08.486]
> > ws_in lists_ws/ws_2 51/0/0/-1/9049 502 204 - - SH-- 7/7/2/1/0 0/0 "GET
> > /-/ping HTTP/1.1"*
> > May 14 04:03:17 prod_lb0 haproxy[28074]: 67.112.125.46:59661
> > [14/May/2010:03:04:22.220]
> > ws_in lists_ws/ws_2 55/0/0/31/3535558 101 27943453 - - ---- 2/2/2/2/0 0/0
> > "GET /app/etherlist/socket?session_id=9680378170&profiler=1 HTTP/1.1"
> > May 14 04:03:17 prod_lb0 haproxy[28398]: 99.66.213.198:58624
> > [14/May/2010:03:52:18.455]
> > ws_in lists_ws/ws_2 455/0/0/2/659334 101 3348145 - - ---- 6/6/1/0/0 0/0
> "GET
> > /app/etherlist/socket?session_id=9385884222&profiler=1 HTTP/1.1"
> > May 14 04:03:17 prod_lb0 haproxy[28074]: 98.210.108.197:43537
> > [14/May/2010:03:04:22.477]
> > ws_in lists_ws/ws_2 165/0/0/2/3535454 101 8809800 - - ---- 1/1/1/1/0 0/0
> > "GET /-/socket?session_id=9646753365 HTTP/1.1"
> > May 14 04:03:18 prod_lb0 haproxy[28074]: 70.36.139.123:56872
> > [14/May/2010:03:04:22.511]
> > ws_in lists_ws/ws_2 88/0/0/2/3535503 101 25165709 - - ---- 0/0/0/0/0 0/0
> > "GET /-/socket?session_id=9669242145 HTTP/1.1"
> > May 14 04:03:21 prod_lb0 haproxy[28398]: *Server lists_ws/ws_2 is DOWN,
> > reason: Layer4 connection problem, info: "Connection refused", check
> > duration: 0ms.*
> > May 14 04:03:23 prod_lb0 haproxy[28398]: *127.0.0.1:58965
> > [14/May/2010:04:03:20.475]
> > ws_in lists_ws/ws_2 10/0/-1/-1/3032 503 212 - - SC-- 8/8/2/0/3 0/0 "GET
> > /-/ping HTTP/1.1"*
> > May 14 04:03:29 prod_lb0 haproxy[28398]: *Server lists_ws/ws_2 is UP,
> > reason: Layer4 check passed, check duration: 0ms.*
> > May 14 04:03:34 prod_lb0 haproxy[28398]: 99.66.213.198:59005
> > [14/May/2010:04:02:43.444]
> > ws_in ws/ws_1 15/0/1/30/50569 304 296 - - cD-- 9/9/4/2/0 0/0 "GET
> > /-/static/luna/browser/images/loading.gif HTTP/1.1"
> -------
>
>

Reply via email to