Re: [Proposal] Concurrency tuning by adding a limit to http-server-close

Willy Tarreau Mon, 29 Aug 2011 01:52:56 -0700

Hi Cyril,

On Sun, Aug 28, 2011 at 06:27:15PM +0200, Cyril Bonté wrote:
> I couldn't find time for haproxy for some weeks. Now I'm on holidays, I try 
> to 
> review some patches I had on my test machine.


Hehe, it's a nice activity for a holiday ;-)

> One of them is the possibility to limit the number of HTTP keep-alive 
> connections to allow a better concurrency between clients.
> 
> I propose to add a suboption to the "http-server-close" one to let haproxy 
> fall back to a "httpclose" mode once a certain number of connections on the 
> frontend is reached.
> The value can be defined :
> - as an absolute limit
>   Example :
>     maxconn 1000
>     option http-server-close limit 500
> 
> - or as a percent of the frontend maxconn
>   Example :
>     maxconn 1000
>     option http-server-close limit 75%
> 
> Let me illustrate the benefits, sorry if it's a bit long to read ;-)
> 
> * THE CONFIGURATION
> 
> First, I used this configuration :
> (maxconn values were set to 150 to ease the tests on a laptop that was not 
> tuned for high # of connections)
> global
>       log localhost local7 debug err
> 
> defaults
>       timeout server 60s
>       timeout client 60s
>       timeout connect 5s
>       timeout http-keep-alive 5s
>       log global
>       option httplog
> 
> listen scl-without-limit
>       bind :8000
>       maxconn 150
>       mode http
>       option http-server-close
>       capture request header User-Agent len 5
>       server local 127.0.0.1:80 maxconn 150
> 
> listen close
>       bind :8001
>       maxconn 150
>       mode http
>       option httpclose
>       capture request header User-Agent len 5
>       server local 127.0.0.1:80 maxconn 150
> 
> listen scl-with-limit-75pct
>       bind :8002
>       maxconn 150
>       mode http
>       option http-server-close limit 75%
>       capture request header User-Agent len 5
>       server local 127.0.0.1:80 maxconn 150
> 
> listen scl-with-limit-95pct
>       bind :8003
>       maxconn 150
>       mode http
>       option http-server-close limit 95%
>       capture request header User-Agent len 5
>       server local 127.0.0.1:80 maxconn 150
> 
> listen scl-with-limit-50pct
>       bind :8004
>       maxconn 150
>       mode http
>       option http-server-close limit 50%
>       capture request header User-Agent len 5
>       server local 127.0.0.1:80 maxconn 150
> 
> listen scl-with-limit-25pct
>       bind :8005
>       maxconn 150
>       mode http
>       option http-server-close limit 25%
>       capture request header User-Agent len 5
>       server local 127.0.0.1:80 maxconn 150
> 
> And I defined a test URL that waits some times before replying (100ms in this 
> tests).
> 
> * THE SCENARIO
> 
> The scenario I used is :
> ab -H "User-Agent: test1" -n10000 -c150 -k http://localhost:<port>/ &
> sleep 1
> ab -H "User-Agent: test2" -n10000 -c150 -k http://localhost:<port>/ &
> sleep 1
> curl -H "User-Agent: test3" http://localhost:<port>/
> 
> and as soon as each "ab" instances are done, I launch a final "ab" test to 
> compare :
> ab -H "User-Agent: test4" -n10000 -c150 -k http://localhost:<port>/
> 
> I've written a log analyzer to sum up the scenario execution, second by 
> second.
> For each test, it shows :
> - the HTTP keep-alive efficiency
> - when the test could really obtain its first response (the '|' characters 
> indicates that the test is started but is waiting for a connection).
> - how long the test runned to obtain the last response
> and the global keep-alive efficiency measured.
> 
> * USING option http-server-close
> 
> Let's see what happens with this scenario when we use the current
> "http-server-close" option :
> 
> Date      Frontend              {test1}  {test2}  {test3}  {test4}  Global
> 00:00:00  scl-without-limit     100                                 100
> 00:00:01  scl-without-limit     100      |                          100
> 00:00:02  scl-without-limit     100      |        |                 100
> 00:00:03  scl-without-limit     100      |        |                 100
> 00:00:04  scl-without-limit     100      |        |                 100
> 00:00:05  scl-without-limit     100      |        |                 100
> 00:00:06  scl-without-limit     100      |        |                 100
> 00:00:07  scl-without-limit     100      |        |                 100
> 00:00:08  scl-without-limit     100      |        |                 100
> 00:00:19  scl-without-limit     100      |        |                 100
> 00:00:10  scl-without-limit     100      |        |                 100
> 00:00:11  scl-without-limit     100      |        |                 100
> 00:00:12  scl-without-limit     100      |        |                 100
> 00:00:13  scl-without-limit              100      |                 100
> 00:00:14  scl-without-limit              100      |                 100
> 00:00:15  scl-without-limit              100      |                 100
> 00:00:16  scl-without-limit              100      |                 100
> 00:00:17  scl-without-limit              100      |                 100
> 00:00:18  scl-without-limit              100      |                 100
> 00:00:19  scl-without-limit              100      |                 100
> 00:00:20  scl-without-limit              100      |                 100
> 00:00:21  scl-without-limit                       100               100
> 00:01:22  scl-without-limit                                100      100
> 00:01:23  scl-without-limit                                100      100
> 00:01:24  scl-without-limit                                100      100
> 00:01:25  scl-without-limit                                100      100
> 00:01:26  scl-without-limit                                100      100
> 00:01:27  scl-without-limit                                100      100
> 00:01:28  scl-without-limit                                100      100
> 00:01:29  scl-without-limit                                100      100
> 
> - test1 used all the connections allowed by haproxy.
> - test2 can't obtain any connection unless test1 is finished.
> - test3 also have to wait until test1 and test2 are finished (sometimes it 
> can 
> be processed in parallel to test2, depending on the ability of test2 to take 
> all the connections first).
> - each test could use keep-alive connections.
> 
> * USING option httpclose
> 
> Now, if we compare with "option httpclose" :
> Date      Frontend              {test1}  {test2}  {test3}  {test4}  Global
> 00:00:00  close                 0                                   0
> 00:00:01  close                 0        0                          0
> 00:00:02  close                 0        0        0                 0
> 00:00:03  close                 0        0                          0
> 00:00:04  close                 0        0                          0
> 00:00:05  close                 0        0                          0
> 00:00:06  close                 0        0                          0
> 00:00:07  close                 0        0                          0
> 00:00:08  close                 0        0                          0
> 00:00:09  close                 0        0                          0
> 00:00:10  close                 0        0                          0
> 00:00:11  close                 0        0                          0
> 00:00:12  close                 0        0                          0
> 00:00:13  close                 0        0                          0
> 00:00:14  close                 0        0                          0
> 00:00:15  close                          0                          0
> 00:00:16  close                                            0        0
> 00:00:17  close                                            0        0
> 00:00:18  close                                            0        0
> 00:00:19  close                                            0        0
> 00:00:20  close                                            0        0
> 00:00:21  close                                            0        0
> 00:00:22  close                                            0        0
> 00:00:23  close                                            0        0
> 
> - test1, test2 and test3 could run concurrently.
> - as wanted, no keep-alive connections were used.
> 
> * NOW USING http-server-close limit 75%
> 
> Once patched, how does haproxy could manage the same scenario using 75% of 
> HTTP keep-alive connections ?
> 
> Date      Frontend              {test1}  {test2}  {test3}  {test4}  Global
> 00:00:00  scl-with-limit-75pct  75.57                               75.57
> 00:00:01  scl-with-limit-75pct  88.87    0                          73.12
> 00:00:02  scl-with-limit-75pct  93.24    0        0                 74.93
> 00:00:03  scl-with-limit-75pct  93.56    0                          74.77
> 00:00:04  scl-with-limit-75pct  93.92    0                          73.47
> 00:00:05  scl-with-limit-75pct  94.39    0                          74.61
> 00:00:06  scl-with-limit-75pct  92.86    0                          74.16
> 00:00:07  scl-with-limit-75pct  94.64    0                          74.12
> 00:00:08  scl-with-limit-75pct  92.39    0                          73.88
> 00:00:09  scl-with-limit-75pct  91.67    7.97                       47.92
> 00:00:10  scl-with-limit-75pct           15.2                       15.2
> 00:00:11  scl-with-limit-75pct           14.91                      14.91
> 00:00:12  scl-with-limit-75pct           14.78                      14.78
> 00:00:13  scl-with-limit-75pct           14.94                      14.94
> 00:00:14  scl-with-limit-75pct           16.92                      16.92
> 00:00:15  scl-with-limit-75pct           100                        100
> 00:00:16  scl-with-limit-75pct                             73.83    73.83
> 00:00:17  scl-with-limit-75pct                             74.68    74.68
> 00:00:18  scl-with-limit-75pct                             73.6     73.6
> 00:00:19  scl-with-limit-75pct                             74.42    74.42
> 00:00:20  scl-with-limit-75pct                             74.55    74.55
> 00:00:21  scl-with-limit-75pct                             74.65    74.65
> 00:00:22  scl-with-limit-75pct                             73.56    73.56
> 00:00:23  scl-with-limit-75pct                             74.62    74.62
> 
> - test1, test2 and test3 could run concurrently.
> - 75% of the global connections still could use HTTP Keep-Alive.
> - As test2 started after test1 reached the limit, it couldn't use keep-alive 
> connections until test1 finished.
> - test4 shows that once alone, the test could use almost 75% of keep-alive 
> connections.
> 
> The same observations can be done with different values, depending on how we 
> ant to tune the proxy.
> For example with 95% of the connections :
> Date      Frontend              {test1}  {test2}  {test3}  {test4}  Global
> 00:00:00  scl-with-limit-95pct  94.32                               94.32
> 00:00:01  scl-with-limit-95pct  98.73    0                          94.1
> 00:00:02  scl-with-limit-95pct  100      0        |                 94.3
> 00:00:03  scl-with-limit-95pct  99.3     0        |                 93.83
> 00:00:04  scl-with-limit-95pct  99.35    0        0                 94.02
> 00:00:05  scl-with-limit-95pct  100      0                          93.88
> 00:00:06  scl-with-limit-95pct  99.42    0                          94.27
> 00:00:07  scl-with-limit-95pct  100      0                          93.86
> 00:00:08  scl-with-limit-95pct  97.73    0                          79.26
> 00:00:09  scl-with-limit-95pct  0        87.87                      86.29
> 00:00:10  scl-with-limit-95pct           100                        100
> 00:00:11  scl-with-limit-95pct           100                        100
> 00:00:12  scl-with-limit-95pct           100                        100
> 00:00:13  scl-with-limit-95pct           88.93                      88.93
> 00:00:14  scl-with-limit-95pct           79.38                      79.38
> 00:00:15  scl-with-limit-95pct           78.86                      78.86
> 00:00:16  scl-with-limit-95pct           84.3                       84.3
> 00:00:17  scl-with-limit-95pct                             94.49    94.49
> 00:00:18  scl-with-limit-95pct                             93.94    93.94
> 00:00:19  scl-with-limit-95pct                             94.01    94.01
> 00:00:20  scl-with-limit-95pct                             94.27    94.27
> 00:00:21  scl-with-limit-95pct                             93.99    93.99
> 00:00:22  scl-with-limit-95pct                             93.97    93.97
> 00:00:23  scl-with-limit-95pct                             93.92    93.92
> 00:00:24  scl-with-limit-95pct                             94.23    94.23
> 
> - as most of the connections are used by test1 and test2, test3 was a little 
> delayed until a connection becomes available (because 5% of them are not 
> persistant).
> 
> The same tests were also done with a 1mb static file, with the same results.
> 
> If the idea is OK for you, I can release a patch, the time for me to review 
> the documentation.

After this long reading, I must say I'm not fond of this at all for
several reasons :
  - if a maxconn limit is too low to sustain keep-alive connections,
    it will be too low to support higher response times, so the maxconn
    must be raised.

  - by limiting the number of keep-alive connections to a certain amount,
    you in fact prevent a number of the clients from using keep-alive
    even if there would have been enough resources for this.

  - by preventing only new connections from using keep-alive, we create
    a certain amount of unfairness between old connections and new ones.

I'd rather proceed differently. For the server-side keep-alive, I have
already identified the need for a list of per-server and per-backend
idle connections. I have also identified the need for a per-frontend
list of dead connections (tarpit). Similarly, we should have a list of
per-frontend idle connections. The condition to accept a new connection
would then be that the number of connections on the backend is below
(maxconn - idleconns). We would then simply close one of the older
idle connections if too many connections are already open, and take
the place.

That way, the impact is minimal and up to maxconn connections can use
keep-alive. And the fairness is retained, since old idle connections
will be closed instead of preventing new ones from using keep-alive.

If you're interested in doing this, I'd be glad to merge it and to
provide help if needed. We need a "struct list fe_idle" in the struct
proxy and add/remove idle connections there.

Cheers,
Willy

Re: [Proposal] Concurrency tuning by adding a limit to http-server-close

Reply via email to