Hi Cyril, On Sun, Aug 28, 2011 at 06:27:15PM +0200, Cyril Bonté wrote: > I couldn't find time for haproxy for some weeks. Now I'm on holidays, I try > to > review some patches I had on my test machine.
Hehe, it's a nice activity for a holiday ;-) > One of them is the possibility to limit the number of HTTP keep-alive > connections to allow a better concurrency between clients. > > I propose to add a suboption to the "http-server-close" one to let haproxy > fall back to a "httpclose" mode once a certain number of connections on the > frontend is reached. > The value can be defined : > - as an absolute limit > Example : > maxconn 1000 > option http-server-close limit 500 > > - or as a percent of the frontend maxconn > Example : > maxconn 1000 > option http-server-close limit 75% > > Let me illustrate the benefits, sorry if it's a bit long to read ;-) > > * THE CONFIGURATION > > First, I used this configuration : > (maxconn values were set to 150 to ease the tests on a laptop that was not > tuned for high # of connections) > global > log localhost local7 debug err > > defaults > timeout server 60s > timeout client 60s > timeout connect 5s > timeout http-keep-alive 5s > log global > option httplog > > listen scl-without-limit > bind :8000 > maxconn 150 > mode http > option http-server-close > capture request header User-Agent len 5 > server local 127.0.0.1:80 maxconn 150 > > listen close > bind :8001 > maxconn 150 > mode http > option httpclose > capture request header User-Agent len 5 > server local 127.0.0.1:80 maxconn 150 > > listen scl-with-limit-75pct > bind :8002 > maxconn 150 > mode http > option http-server-close limit 75% > capture request header User-Agent len 5 > server local 127.0.0.1:80 maxconn 150 > > listen scl-with-limit-95pct > bind :8003 > maxconn 150 > mode http > option http-server-close limit 95% > capture request header User-Agent len 5 > server local 127.0.0.1:80 maxconn 150 > > listen scl-with-limit-50pct > bind :8004 > maxconn 150 > mode http > option http-server-close limit 50% > capture request header User-Agent len 5 > server local 127.0.0.1:80 maxconn 150 > > listen scl-with-limit-25pct > bind :8005 > maxconn 150 > mode http > option http-server-close limit 25% > capture request header User-Agent len 5 > server local 127.0.0.1:80 maxconn 150 > > And I defined a test URL that waits some times before replying (100ms in this > tests). > > * THE SCENARIO > > The scenario I used is : > ab -H "User-Agent: test1" -n10000 -c150 -k http://localhost:<port>/ & > sleep 1 > ab -H "User-Agent: test2" -n10000 -c150 -k http://localhost:<port>/ & > sleep 1 > curl -H "User-Agent: test3" http://localhost:<port>/ > > and as soon as each "ab" instances are done, I launch a final "ab" test to > compare : > ab -H "User-Agent: test4" -n10000 -c150 -k http://localhost:<port>/ > > I've written a log analyzer to sum up the scenario execution, second by > second. > For each test, it shows : > - the HTTP keep-alive efficiency > - when the test could really obtain its first response (the '|' characters > indicates that the test is started but is waiting for a connection). > - how long the test runned to obtain the last response > and the global keep-alive efficiency measured. > > * USING option http-server-close > > Let's see what happens with this scenario when we use the current > "http-server-close" option : > > Date Frontend {test1} {test2} {test3} {test4} Global > 00:00:00 scl-without-limit 100 100 > 00:00:01 scl-without-limit 100 | 100 > 00:00:02 scl-without-limit 100 | | 100 > 00:00:03 scl-without-limit 100 | | 100 > 00:00:04 scl-without-limit 100 | | 100 > 00:00:05 scl-without-limit 100 | | 100 > 00:00:06 scl-without-limit 100 | | 100 > 00:00:07 scl-without-limit 100 | | 100 > 00:00:08 scl-without-limit 100 | | 100 > 00:00:19 scl-without-limit 100 | | 100 > 00:00:10 scl-without-limit 100 | | 100 > 00:00:11 scl-without-limit 100 | | 100 > 00:00:12 scl-without-limit 100 | | 100 > 00:00:13 scl-without-limit 100 | 100 > 00:00:14 scl-without-limit 100 | 100 > 00:00:15 scl-without-limit 100 | 100 > 00:00:16 scl-without-limit 100 | 100 > 00:00:17 scl-without-limit 100 | 100 > 00:00:18 scl-without-limit 100 | 100 > 00:00:19 scl-without-limit 100 | 100 > 00:00:20 scl-without-limit 100 | 100 > 00:00:21 scl-without-limit 100 100 > 00:01:22 scl-without-limit 100 100 > 00:01:23 scl-without-limit 100 100 > 00:01:24 scl-without-limit 100 100 > 00:01:25 scl-without-limit 100 100 > 00:01:26 scl-without-limit 100 100 > 00:01:27 scl-without-limit 100 100 > 00:01:28 scl-without-limit 100 100 > 00:01:29 scl-without-limit 100 100 > > - test1 used all the connections allowed by haproxy. > - test2 can't obtain any connection unless test1 is finished. > - test3 also have to wait until test1 and test2 are finished (sometimes it > can > be processed in parallel to test2, depending on the ability of test2 to take > all the connections first). > - each test could use keep-alive connections. > > * USING option httpclose > > Now, if we compare with "option httpclose" : > Date Frontend {test1} {test2} {test3} {test4} Global > 00:00:00 close 0 0 > 00:00:01 close 0 0 0 > 00:00:02 close 0 0 0 0 > 00:00:03 close 0 0 0 > 00:00:04 close 0 0 0 > 00:00:05 close 0 0 0 > 00:00:06 close 0 0 0 > 00:00:07 close 0 0 0 > 00:00:08 close 0 0 0 > 00:00:09 close 0 0 0 > 00:00:10 close 0 0 0 > 00:00:11 close 0 0 0 > 00:00:12 close 0 0 0 > 00:00:13 close 0 0 0 > 00:00:14 close 0 0 0 > 00:00:15 close 0 0 > 00:00:16 close 0 0 > 00:00:17 close 0 0 > 00:00:18 close 0 0 > 00:00:19 close 0 0 > 00:00:20 close 0 0 > 00:00:21 close 0 0 > 00:00:22 close 0 0 > 00:00:23 close 0 0 > > - test1, test2 and test3 could run concurrently. > - as wanted, no keep-alive connections were used. > > * NOW USING http-server-close limit 75% > > Once patched, how does haproxy could manage the same scenario using 75% of > HTTP keep-alive connections ? > > Date Frontend {test1} {test2} {test3} {test4} Global > 00:00:00 scl-with-limit-75pct 75.57 75.57 > 00:00:01 scl-with-limit-75pct 88.87 0 73.12 > 00:00:02 scl-with-limit-75pct 93.24 0 0 74.93 > 00:00:03 scl-with-limit-75pct 93.56 0 74.77 > 00:00:04 scl-with-limit-75pct 93.92 0 73.47 > 00:00:05 scl-with-limit-75pct 94.39 0 74.61 > 00:00:06 scl-with-limit-75pct 92.86 0 74.16 > 00:00:07 scl-with-limit-75pct 94.64 0 74.12 > 00:00:08 scl-with-limit-75pct 92.39 0 73.88 > 00:00:09 scl-with-limit-75pct 91.67 7.97 47.92 > 00:00:10 scl-with-limit-75pct 15.2 15.2 > 00:00:11 scl-with-limit-75pct 14.91 14.91 > 00:00:12 scl-with-limit-75pct 14.78 14.78 > 00:00:13 scl-with-limit-75pct 14.94 14.94 > 00:00:14 scl-with-limit-75pct 16.92 16.92 > 00:00:15 scl-with-limit-75pct 100 100 > 00:00:16 scl-with-limit-75pct 73.83 73.83 > 00:00:17 scl-with-limit-75pct 74.68 74.68 > 00:00:18 scl-with-limit-75pct 73.6 73.6 > 00:00:19 scl-with-limit-75pct 74.42 74.42 > 00:00:20 scl-with-limit-75pct 74.55 74.55 > 00:00:21 scl-with-limit-75pct 74.65 74.65 > 00:00:22 scl-with-limit-75pct 73.56 73.56 > 00:00:23 scl-with-limit-75pct 74.62 74.62 > > - test1, test2 and test3 could run concurrently. > - 75% of the global connections still could use HTTP Keep-Alive. > - As test2 started after test1 reached the limit, it couldn't use keep-alive > connections until test1 finished. > - test4 shows that once alone, the test could use almost 75% of keep-alive > connections. > > The same observations can be done with different values, depending on how we > ant to tune the proxy. > For example with 95% of the connections : > Date Frontend {test1} {test2} {test3} {test4} Global > 00:00:00 scl-with-limit-95pct 94.32 94.32 > 00:00:01 scl-with-limit-95pct 98.73 0 94.1 > 00:00:02 scl-with-limit-95pct 100 0 | 94.3 > 00:00:03 scl-with-limit-95pct 99.3 0 | 93.83 > 00:00:04 scl-with-limit-95pct 99.35 0 0 94.02 > 00:00:05 scl-with-limit-95pct 100 0 93.88 > 00:00:06 scl-with-limit-95pct 99.42 0 94.27 > 00:00:07 scl-with-limit-95pct 100 0 93.86 > 00:00:08 scl-with-limit-95pct 97.73 0 79.26 > 00:00:09 scl-with-limit-95pct 0 87.87 86.29 > 00:00:10 scl-with-limit-95pct 100 100 > 00:00:11 scl-with-limit-95pct 100 100 > 00:00:12 scl-with-limit-95pct 100 100 > 00:00:13 scl-with-limit-95pct 88.93 88.93 > 00:00:14 scl-with-limit-95pct 79.38 79.38 > 00:00:15 scl-with-limit-95pct 78.86 78.86 > 00:00:16 scl-with-limit-95pct 84.3 84.3 > 00:00:17 scl-with-limit-95pct 94.49 94.49 > 00:00:18 scl-with-limit-95pct 93.94 93.94 > 00:00:19 scl-with-limit-95pct 94.01 94.01 > 00:00:20 scl-with-limit-95pct 94.27 94.27 > 00:00:21 scl-with-limit-95pct 93.99 93.99 > 00:00:22 scl-with-limit-95pct 93.97 93.97 > 00:00:23 scl-with-limit-95pct 93.92 93.92 > 00:00:24 scl-with-limit-95pct 94.23 94.23 > > - as most of the connections are used by test1 and test2, test3 was a little > delayed until a connection becomes available (because 5% of them are not > persistant). > > The same tests were also done with a 1mb static file, with the same results. > > If the idea is OK for you, I can release a patch, the time for me to review > the documentation. After this long reading, I must say I'm not fond of this at all for several reasons : - if a maxconn limit is too low to sustain keep-alive connections, it will be too low to support higher response times, so the maxconn must be raised. - by limiting the number of keep-alive connections to a certain amount, you in fact prevent a number of the clients from using keep-alive even if there would have been enough resources for this. - by preventing only new connections from using keep-alive, we create a certain amount of unfairness between old connections and new ones. I'd rather proceed differently. For the server-side keep-alive, I have already identified the need for a list of per-server and per-backend idle connections. I have also identified the need for a per-frontend list of dead connections (tarpit). Similarly, we should have a list of per-frontend idle connections. The condition to accept a new connection would then be that the number of connections on the backend is below (maxconn - idleconns). We would then simply close one of the older idle connections if too many connections are already open, and take the place. That way, the impact is minimal and up to maxconn connections can use keep-alive. And the fairness is retained, since old idle connections will be closed instead of preventing new ones from using keep-alive. If you're interested in doing this, I'd be glad to merge it and to provide help if needed. We need a "struct list fe_idle" in the struct proxy and add/remove idle connections there. Cheers, Willy

