Dear all,

some of you might be interested in the following tests, which provides
as wll some preview to the next NaviServer release.

Some recent OS kernels support SO_REUSEPORT [1], which allows us to
open multiple threads to listen on the same port. The current development
version for NaviServer 4.99.15 uses the reuseport feature (if supported by
the OS) when one specifies the new config option "driverthreads" on a driver
with a value larger than 1. With this new feature, NaviServer will 
support for all
stages of a request multi threaded execution (driverthreads, 
spoolerthreads,
connection threads, and writerthreads).

Although the implementation is not fully finished in NaviServer, (prebind,
documentation,... missing), here are a few results from some testing on
a Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz with 8 cores [2] on a
Linux box with the Linux kernel 3.13.0-86-generic. As expected, the
benefit will be for most application not overwhelming, since the driver
thread uses async io and performs no complex computations; but still,
the results are measurable for this test. It was also not clear, whether
the OS will use the multiple driver threads.

The test performs about 8mio identical requests to the out-of-the
box NaviServer start page (static page) using different number of
clients. The actual requests are measured by weighttp [3].

We can see, that NaviServer processes up to 62k
requests, leading to best results with 130 concurrent clients.

1 driverthread:

Number        Requests per second
    of     ----------------------------
Clients      min       ave       max         Time
--------  --------  --------  --------  --------------
      10,    41537,    41623,    41712,   00:41:06
      20,    50468,    50567,    50643,   00:41:12
      30,    50991,    52123,    53439,   00:41:18
      40,    55269,    55518,    55906,   00:41:24
      50,    56124,    57238,    58526,   00:41:29
      60,    57912,    58774,    59471,   00:41:34
      70,    59162,    59791,    60676,   00:41:39
      80,    60369,    61261,    62273,   00:41:44
      90,    60550,    61569,    62234,   00:41:49
     100,    61696,    61866,    62142,   00:41:54
     110,    61844,    62255,    62809,   00:41:58
     120,    61539,    62225,    62890,   00:42:03
     130,    62539,    62726,    63007,   00:42:08
     140,    61408,    61791,    62364,   00:42:13
     150,    60133,    60477,    60863,   00:42:18
     160,    57031,    58027,    58556,   00:42:23
     170,    56698,    58295,    59615,   00:42:28
     180,    56532,    57784,    59081,   00:42:33
     190,    56854,    57331,    58030,   00:42:39
     200,    57057,    57505,    57741,   00:42:44
     210,    56710,    57207,    58001,   00:42:49
     220,    55877,    57129,    57857,   00:42:54
     230,    56414,    57521,    59239,   00:43:00
     240,    55205,    56078,    56871,   00:43:05
     250,    55325,    56398,    57146,   00:43:10

When we configure 2 driverthreads, the performance with a few
clients is interestingly worse (40k vs. 41k above), the peak
is at a similar level, but the performance decrease after
the peak is more flat. At 250 clients, we have now 61k
with 2 driverthreads instead of 56k with one driverthread.

2 driverthreads:

  Number        Requests per second
    of     ----------------------------
Clients      min       ave       max         Time
--------  --------  --------  --------  --------------
      10,    40001,    40106,    40222,   00:49:20
      20,    49768,    50164,    50510,   00:49:26
      30,    52376,    52584,    52805,   00:49:32
      40,    56059,    56204,    56308,   00:49:38
      50,    56353,    56798,    57104,   00:49:43
      60,    58605,    58949,    59283,   00:49:48
      70,    59749,    59950,    60209,   00:49:53
      80,    61523,    61682,    61992,   00:49:58
      90,    62222,    62568,    63056,   00:50:03
     100,    62504,    62722,    62932,   00:50:07
     110,    62452,    62607,    62729,   00:50:12
     120,    62880,    62927,    62968,   00:50:17
     130,    62663,    62804,    62883,   00:50:22
     140,    62447,    62509,    62563,   00:50:27
     150,    61786,    62061,    62408,   00:50:31
     160,    60609,    61769,    62419,   00:50:36
     170,    60304,    61654,    62613,   00:50:41
     180,    60355,    60702,    60965,   00:50:46
     190,    60268,    60690,    61153,   00:50:51
     200,    61023,    61142,    61233,   00:50:56
     210,    61029,    61356,    62003,   00:51:01
     220,    60858,    61136,    61596,   00:51:06
     230,    61064,    61362,    61580,   00:51:11
     240,    60867,    61260,    61530,   00:51:16
     250,    61137,    61390,    61645,   00:51:20

The same continues with 10 driverthreads configured.
the performance with a few clients is interestingly even
worse (new 37k), reaching a peak of ~62k requests
per second with more than 200 clients.

10 driverthreads:

Number        Requests per second
    of     ----------------------------
Clients      min       ave       max         Time
--------  --------  --------  --------  --------------
      10,    37443,    37657,    37875,   00:37:28
      20,    49141,    49492,    49697,   00:37:34
      30,    52764,    52887,    53081,   00:37:40
      40,    55834,    56531,    57154,   00:37:45
      50,    58638,    59156,    59624,   00:37:50
      60,    59130,    59391,    59731,   00:37:55
      70,    59807,    59920,    60005,   00:38:00
      80,    59863,    60001,    60173,   00:38:05
      90,    60107,    60167,    60244,   00:38:10
     100,    59107,    59718,    60360,   00:38:15
     110,    59480,    59927,    60496,   00:38:20
     120,    59751,    60105,    60505,   00:38:25
     130,    59479,    59635,    59799,   00:38:30
     140,    59560,    59974,    60529,   00:38:35
     150,    60571,    61301,    61760,   00:38:40
     160,    61361,    61620,    61939,   00:38:45
     170,    61261,    61821,    62154,   00:38:50
     180,    60878,    61285,    61737,   00:38:55
     190,    61039,    61553,    61863,   00:39:00
     200,    61701,    61841,    62088,   00:39:05
     210,    61665,    61933,    62222,   00:39:09
     220,    61562,    61669,    61844,   00:39:14
     230,    61448,    61634,    61785,   00:39:19
     240,    61711,    61963,    62124,   00:39:24
     250,    61531,    61789,    62154,   00:39:29

As the following output (produced by the new "ns_driver stats" command)
shows, Linux distributes the requests evenly over the driver threads:

name nsssl:0 received 0 spooled 0 partial 0 errors 0
name nssock:9 received 782662 spooled 0 partial 1051 errors 0
name nssock:8 received 786790 spooled 0 partial 1033 errors 0
name nssock:7 received 808231 spooled 0 partial 1054 errors 0
name nssock:6 received 813971 spooled 0 partial 1042 errors 0
name nssock:5 received 843588 spooled 0 partial 1070 errors 0
name nssock:4 received 727655 spooled 0 partial 982 errors 0
name nssock:3 received 777413 spooled 0 partial 1066 errors 0
name nssock:2 received 778254 spooled 0 partial 1009 errors 0
name nssock:1 received 824841 spooled 0 partial 1002 errors 0
name nssock:0 received 778979 spooled 0 partial 988 errors 0

Certainly, for most real applications, the benefits will be even less 
than this.
However, more architectures with more cores are coming, so maybe
future applications on future hardware can benefit more from this
feature.

all the best

-gn


[1] https://lwn.net/Articles/542629/
[2] 
http://ark.intel.com/products/83356/Intel-Xeon-Processor-E5-2630-v3-20M-Cache-2_40-GHz
[3] https://github.com/lighttpd/weighttp


------------------------------------------------------------------------------
_______________________________________________
naviserver-devel mailing list
naviserver-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/naviserver-devel

Reply via email to