Hi Willy and the list,
I couldn't find time for haproxy for some weeks. Now I'm on holidays, I try to
review some patches I had on my test machine.
One of them is the possibility to limit the number of HTTP keep-alive
connections to allow a better concurrency between clients.
I propose to add a suboption to the "http-server-close" one to let haproxy
fall back to a "httpclose" mode once a certain number of connections on the
frontend is reached.
The value can be defined :
- as an absolute limit
Example :
maxconn 1000
option http-server-close limit 500
- or as a percent of the frontend maxconn
Example :
maxconn 1000
option http-server-close limit 75%
Let me illustrate the benefits, sorry if it's a bit long to read ;-)
* THE CONFIGURATION
First, I used this configuration :
(maxconn values were set to 150 to ease the tests on a laptop that was not
tuned for high # of connections)
global
log localhost local7 debug err
defaults
timeout server 60s
timeout client 60s
timeout connect 5s
timeout http-keep-alive 5s
log global
option httplog
listen scl-without-limit
bind :8000
maxconn 150
mode http
option http-server-close
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150
listen close
bind :8001
maxconn 150
mode http
option httpclose
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150
listen scl-with-limit-75pct
bind :8002
maxconn 150
mode http
option http-server-close limit 75%
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150
listen scl-with-limit-95pct
bind :8003
maxconn 150
mode http
option http-server-close limit 95%
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150
listen scl-with-limit-50pct
bind :8004
maxconn 150
mode http
option http-server-close limit 50%
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150
listen scl-with-limit-25pct
bind :8005
maxconn 150
mode http
option http-server-close limit 25%
capture request header User-Agent len 5
server local 127.0.0.1:80 maxconn 150
And I defined a test URL that waits some times before replying (100ms in this
tests).
* THE SCENARIO
The scenario I used is :
ab -H "User-Agent: test1" -n10000 -c150 -k http://localhost:<port>/ &
sleep 1
ab -H "User-Agent: test2" -n10000 -c150 -k http://localhost:<port>/ &
sleep 1
curl -H "User-Agent: test3" http://localhost:<port>/
and as soon as each "ab" instances are done, I launch a final "ab" test to
compare :
ab -H "User-Agent: test4" -n10000 -c150 -k http://localhost:<port>/
I've written a log analyzer to sum up the scenario execution, second by
second.
For each test, it shows :
- the HTTP keep-alive efficiency
- when the test could really obtain its first response (the '|' characters
indicates that the test is started but is waiting for a connection).
- how long the test runned to obtain the last response
and the global keep-alive efficiency measured.
* USING option http-server-close
Let's see what happens with this scenario when we use the current
"http-server-close" option :
Date Frontend {test1} {test2} {test3} {test4} Global
00:00:00 scl-without-limit 100 100
00:00:01 scl-without-limit 100 | 100
00:00:02 scl-without-limit 100 | | 100
00:00:03 scl-without-limit 100 | | 100
00:00:04 scl-without-limit 100 | | 100
00:00:05 scl-without-limit 100 | | 100
00:00:06 scl-without-limit 100 | | 100
00:00:07 scl-without-limit 100 | | 100
00:00:08 scl-without-limit 100 | | 100
00:00:19 scl-without-limit 100 | | 100
00:00:10 scl-without-limit 100 | | 100
00:00:11 scl-without-limit 100 | | 100
00:00:12 scl-without-limit 100 | | 100
00:00:13 scl-without-limit 100 | 100
00:00:14 scl-without-limit 100 | 100
00:00:15 scl-without-limit 100 | 100
00:00:16 scl-without-limit 100 | 100
00:00:17 scl-without-limit 100 | 100
00:00:18 scl-without-limit 100 | 100
00:00:19 scl-without-limit 100 | 100
00:00:20 scl-without-limit 100 | 100
00:00:21 scl-without-limit 100 100
00:01:22 scl-without-limit 100 100
00:01:23 scl-without-limit 100 100
00:01:24 scl-without-limit 100 100
00:01:25 scl-without-limit 100 100
00:01:26 scl-without-limit 100 100
00:01:27 scl-without-limit 100 100
00:01:28 scl-without-limit 100 100
00:01:29 scl-without-limit 100 100
- test1 used all the connections allowed by haproxy.
- test2 can't obtain any connection unless test1 is finished.
- test3 also have to wait until test1 and test2 are finished (sometimes it can
be processed in parallel to test2, depending on the ability of test2 to take
all the connections first).
- each test could use keep-alive connections.
* USING option httpclose
Now, if we compare with "option httpclose" :
Date Frontend {test1} {test2} {test3} {test4} Global
00:00:00 close 0 0
00:00:01 close 0 0 0
00:00:02 close 0 0 0 0
00:00:03 close 0 0 0
00:00:04 close 0 0 0
00:00:05 close 0 0 0
00:00:06 close 0 0 0
00:00:07 close 0 0 0
00:00:08 close 0 0 0
00:00:09 close 0 0 0
00:00:10 close 0 0 0
00:00:11 close 0 0 0
00:00:12 close 0 0 0
00:00:13 close 0 0 0
00:00:14 close 0 0 0
00:00:15 close 0 0
00:00:16 close 0 0
00:00:17 close 0 0
00:00:18 close 0 0
00:00:19 close 0 0
00:00:20 close 0 0
00:00:21 close 0 0
00:00:22 close 0 0
00:00:23 close 0 0
- test1, test2 and test3 could run concurrently.
- as wanted, no keep-alive connections were used.
* NOW USING http-server-close limit 75%
Once patched, how does haproxy could manage the same scenario using 75% of
HTTP keep-alive connections ?
Date Frontend {test1} {test2} {test3} {test4} Global
00:00:00 scl-with-limit-75pct 75.57 75.57
00:00:01 scl-with-limit-75pct 88.87 0 73.12
00:00:02 scl-with-limit-75pct 93.24 0 0 74.93
00:00:03 scl-with-limit-75pct 93.56 0 74.77
00:00:04 scl-with-limit-75pct 93.92 0 73.47
00:00:05 scl-with-limit-75pct 94.39 0 74.61
00:00:06 scl-with-limit-75pct 92.86 0 74.16
00:00:07 scl-with-limit-75pct 94.64 0 74.12
00:00:08 scl-with-limit-75pct 92.39 0 73.88
00:00:09 scl-with-limit-75pct 91.67 7.97 47.92
00:00:10 scl-with-limit-75pct 15.2 15.2
00:00:11 scl-with-limit-75pct 14.91 14.91
00:00:12 scl-with-limit-75pct 14.78 14.78
00:00:13 scl-with-limit-75pct 14.94 14.94
00:00:14 scl-with-limit-75pct 16.92 16.92
00:00:15 scl-with-limit-75pct 100 100
00:00:16 scl-with-limit-75pct 73.83 73.83
00:00:17 scl-with-limit-75pct 74.68 74.68
00:00:18 scl-with-limit-75pct 73.6 73.6
00:00:19 scl-with-limit-75pct 74.42 74.42
00:00:20 scl-with-limit-75pct 74.55 74.55
00:00:21 scl-with-limit-75pct 74.65 74.65
00:00:22 scl-with-limit-75pct 73.56 73.56
00:00:23 scl-with-limit-75pct 74.62 74.62
- test1, test2 and test3 could run concurrently.
- 75% of the global connections still could use HTTP Keep-Alive.
- As test2 started after test1 reached the limit, it couldn't use keep-alive
connections until test1 finished.
- test4 shows that once alone, the test could use almost 75% of keep-alive
connections.
The same observations can be done with different values, depending on how we
ant to tune the proxy.
For example with 95% of the connections :
Date Frontend {test1} {test2} {test3} {test4} Global
00:00:00 scl-with-limit-95pct 94.32 94.32
00:00:01 scl-with-limit-95pct 98.73 0 94.1
00:00:02 scl-with-limit-95pct 100 0 | 94.3
00:00:03 scl-with-limit-95pct 99.3 0 | 93.83
00:00:04 scl-with-limit-95pct 99.35 0 0 94.02
00:00:05 scl-with-limit-95pct 100 0 93.88
00:00:06 scl-with-limit-95pct 99.42 0 94.27
00:00:07 scl-with-limit-95pct 100 0 93.86
00:00:08 scl-with-limit-95pct 97.73 0 79.26
00:00:09 scl-with-limit-95pct 0 87.87 86.29
00:00:10 scl-with-limit-95pct 100 100
00:00:11 scl-with-limit-95pct 100 100
00:00:12 scl-with-limit-95pct 100 100
00:00:13 scl-with-limit-95pct 88.93 88.93
00:00:14 scl-with-limit-95pct 79.38 79.38
00:00:15 scl-with-limit-95pct 78.86 78.86
00:00:16 scl-with-limit-95pct 84.3 84.3
00:00:17 scl-with-limit-95pct 94.49 94.49
00:00:18 scl-with-limit-95pct 93.94 93.94
00:00:19 scl-with-limit-95pct 94.01 94.01
00:00:20 scl-with-limit-95pct 94.27 94.27
00:00:21 scl-with-limit-95pct 93.99 93.99
00:00:22 scl-with-limit-95pct 93.97 93.97
00:00:23 scl-with-limit-95pct 93.92 93.92
00:00:24 scl-with-limit-95pct 94.23 94.23
- as most of the connections are used by test1 and test2, test3 was a little
delayed until a connection becomes available (because 5% of them are not
persistant).
The same tests were also done with a 1mb static file, with the same results.
If the idea is OK for you, I can release a patch, the time for me to review
the documentation.
--
wCyril Bonté