Hello! On Tue, May 21, 2013 at 04:16:05PM +0400, Dmitry Popov wrote:
> On Mon, 20 May 2013 21:11:03 +0400 > Maxim Dounin <mdou...@mdounin.ru> wrote: > > > Patch: > > > > diff --git a/src/http/modules/ngx_http_upstream_least_conn_module.c > > b/src/http/modules/ngx_http_upstream_least_conn_module.c > > --- a/src/http/modules/ngx_http_upstream_least_conn_module.c > > +++ b/src/http/modules/ngx_http_upstream_least_conn_module.c > > @@ -282,7 +282,10 @@ ngx_http_upstream_get_least_conn_peer(ng > > } > > > > best->current_weight -= total; > > - best->checked = now; > > + > > + if (now - best->checked > best->fail_timeout) { > > + best->checked = now; > > + } > > > > pc->sockaddr = best->sockaddr; > > pc->socklen = best->socklen; > > diff --git a/src/http/ngx_http_upstream_round_robin.c > > b/src/http/ngx_http_upstream_round_robin.c > > --- a/src/http/ngx_http_upstream_round_robin.c > > +++ b/src/http/ngx_http_upstream_round_robin.c > > @@ -523,7 +523,10 @@ ngx_http_upstream_get_peer(ngx_http_upst > > rrp->tried[n] |= m; > > > > best->current_weight -= total; > > - best->checked = now; > > + > > + if (now - best->checked > best->fail_timeout) { > > + best->checked = now; > > + } > > > > return best; > > } > > Still a bug exists because of this code: > > if (state & NGX_PEER_FAILED) { > /* ... */ > peer->fails++; > peer->accessed = now; > peer->checked = now; > /* .... */ > } else { > if (peer->accessed < peer->checked) { > peer->fails = 0; > } > } > > Consider upstream which fails at least once each second (actually it's enough > to fail each fail_timeout seconds). This upstream will be disabled as soon as > it > will fail max_fails times no matter what fail_timeout is. This is expected behaviour. Documentation is a bit simplified here, and fail_timeout is used like session time limit - the peer->fails counter is reset once there are no failures within fail_timeout. While this might be non-ideal for some use cases, it's certainly not a bug. > I propose the following solution: > let's have two bits per peer: > 1) avail - set if this peer is available for usage (fails < max_fails) > 2) avail_check - set if this peer was disabled, but its fail_timeout is > expired and > we sent one request to check if it's alive > and two timestamps: > 3) first_fail - we'll count number of fails only inside > [first_fail, first_fail + fail_timeout) interval > 4) unavail_from - peer is unavailable inside > [unavail_from, unavail_from + fail_timeout) interval > > first_fail will only be used if avail == 1, unavail_from only if avail == 0, > so we may use union here. > > Simplified code: > get: > if (!avail > && (now < unavail_from + fail_timeout || avail_check)) > { > continue; > } > > if (!avail) > avail_check =1; > free: > if (avail && now >= first_fail + fail_timeout) > fails = 0; > > if (request_failed) { > if (fails == 0) > first_fail = now; > fails++; > if (fails >= max_fails) { > avail = 0; > unavail_from = now; > } > avail_check = 0; > } else { > if (avail_check) { > fails = 0; > avail = 1; > avail_check = 0; > } > } Such algorithm forget everything about previous failures once per fail_timeout, and won't detect bursts of failures split across two fail_timeout intervals. Consider: - max_fails = 3, fail_timeout = 10s - failure at 0s - failure at 9s - at 10s peer->fails counter is reset - failure at 11s - failure at 12s While 3 failures are only 3 seconds away from each other, this is not detected due to granularity introduced by the algorithm. -- Maxim Dounin http://nginx.org/en/donation.html _______________________________________________ nginx-devel mailing list nginx-devel@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx-devel