https://issues.apache.org/bugzilla/show_bug.cgi?id=46215
Summary: Race condition in bybusyness algorithm
Product: Apache httpd-2
Version: 2.2.10
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: mod_proxy_balancer
AssignedTo: [email protected]
ReportedBy: [EMAIL PROTECTED]
Created an attachment (id=22876)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=22876)
Patch resolving the issue
In scenarios with large numbers of backend workers, a race condition can
prevent the 'busy' counter from being decremented properly, causing some
workers to be ignored completely. Typically, all workers should be at the same
busyness when idle.
ps output showing the effect:
deploy 27326 14.3 3.2 271728 130632 ? Sl 23:23 1:03
mongrel_rails [8000/0/289]: idle
deploy 27329 15.4 3.7 298428 150368 ? Sl 23:23 1:08
mongrel_rails [8001/0/289]: idle
deploy 27332 16.6 3.8 296292 154976 ? Sl 23:23 1:13
mongrel_rails [8002/0/288]: idle
deploy 27335 15.5 3.3 279404 136820 ? Sl 23:23 1:08
mongrel_rails [8003/0/289]: idle
deploy 27338 16.6 3.4 280396 139452 ? Sl 23:23 1:13
mongrel_rails [8004/0/290]: idle
deploy 27341 13.6 3.3 275600 134724 ? Sl 23:23 1:00
mongrel_rails [8005/0/288]: idle
deploy 27344 1.1 1.5 155708 62616 ? Sl 23:23 0:04
mongrel_rails [8006/0/7]: idle
deploy 27347 16.2 3.7 299976 153908 ? Sl 23:23 1:11
mongrel_rails [8007/0/287]: idle
deploy 27350 1.3 2.5 241708 104364 ? Sl 23:23 0:05
mongrel_rails [8008/0/5]: idle
deploy 27354 1.4 2.6 246368 109044 ? Sl 23:23 0:06
mongrel_rails [8009/0/4]: idle
deploy 27359 1.0 1.4 151124 58096 ? Sl 23:23 0:04
mongrel_rails [8010/0/0]: idle
deploy 27362 0.9 1.4 151140 58112 ? Sl 23:23 0:04
mongrel_rails [8011/0/0]: idle
balancer-manager output, showing all balancers are in 'Ok' state:
Worker URL Route RouteRedir Factor Set Status Elected To
From
http://cimbar:8000 1 0 Ok 415 315K
22M
http://cimbar:8001 1 0 Ok 416 324K
22M
http://cimbar:8002 1 0 Ok 484 392K
27M
http://cimbar:8003 1 0 Ok 483 381K
26M
http://cimbar:8004 1 0 Ok 484 379K
26M
http://cimbar:8005 1 0 Ok 484 374K
25M
http://cimbar:8006 1 0 Ok 52 44K
2.6M
http://cimbar:8007 1 0 Ok 608 474K
34M
http://cimbar:8008 1 0 Ok 53 41K
2.6M
http://cimbar:8009 1 0 Ok 53 43K
2.9M
http://cimbar:8010 1 0 Ok 5 1.1K
6.6K
http://cimbar:8011 1 0 Ok 7 1.2K
62K
When we look at the debug logs, however, we see that the 'busy' counter for
port 8006 and 8008-8011 are all at '3', and that these are the ones which are
not receiving requests.
[EMAIL PROTECTED]:/var/log/apache2$ for port in {8000..8011}; do fgrep
"bybusyness selected worker \"http://cimbar:${port}" /tmp/logfile | tail -n1;
done
[Thu Nov 13 23:32:39 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8000" : busy 2 : lbstatus -1922
[Thu Nov 13 23:32:45 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8001" : busy 2 : lbstatus -1910
[Thu Nov 13 23:34:24 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8002" : busy 2 : lbstatus -2233
[Thu Nov 13 23:34:25 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8003" : busy 2 : lbstatus -2236
[Thu Nov 13 23:34:23 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8004" : busy 2 : lbstatus -2234
[Thu Nov 13 23:34:24 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8005" : busy 2 : lbstatus -2236
[Thu Nov 13 23:32:45 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8006" : busy 3 : lbstatus 2468
[Thu Nov 13 23:34:25 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8007" : busy 1 : lbstatus -3444
[Thu Nov 13 23:33:54 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8008" : busy 3 : lbstatus 2724
[Thu Nov 13 23:32:43 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8009" : busy 3 : lbstatus 2459
[Thu Nov 13 23:32:39 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8010" : busy 3 : lbstatus 2987
[Thu Nov 13 23:32:45 2008] [debug] mod_proxy_balancer.c(1173): proxy:
bybusyness selected worker "http://cimbar:8011" : busy 3 : lbstatus 2983
I've traced this effect to modules/proxy/mod_proxy_balancer.c:578-579 where the
counter is incremented. There is no mutex locking around this code, so a race
condition ensues where simultaneous incrementing and decrementing by separate
threads on separate CPUs can result in an upward skew.
While it's likely that the skew would occur downwards as well, the decrement
function only works if busyness is > 0, so it will never go below that point.
It could only skew downwards from busyness of 2 or greater, and at that point
would receive more traffic so it would rapidly rise again. The skew has an
effective upper bound of the maximum concurrent busyness the server handles, as
higher numbers, once skewed, are ignored until all other workers are at the
same level. It would require a very high consistent load to travel far beyond a
busyness of 3.
I've attached a patch which moves the decrement code in
proxy_balancer_post_request() up into the previously-commented-out mutex
lock/unlock code in the same function. This has so far resolved the issue on
our production system.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]