Dear all, some of you might be interested in the following tests, which provides as wll some preview to the next NaviServer release.
Some recent OS kernels support SO_REUSEPORT [1], which allows us to open multiple threads to listen on the same port. The current development version for NaviServer 4.99.15 uses the reuseport feature (if supported by the OS) when one specifies the new config option "driverthreads" on a driver with a value larger than 1. With this new feature, NaviServer will support for all stages of a request multi threaded execution (driverthreads, spoolerthreads, connection threads, and writerthreads). Although the implementation is not fully finished in NaviServer, (prebind, documentation,... missing), here are a few results from some testing on a Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz with 8 cores [2] on a Linux box with the Linux kernel 3.13.0-86-generic. As expected, the benefit will be for most application not overwhelming, since the driver thread uses async io and performs no complex computations; but still, the results are measurable for this test. It was also not clear, whether the OS will use the multiple driver threads. The test performs about 8mio identical requests to the out-of-the box NaviServer start page (static page) using different number of clients. The actual requests are measured by weighttp [3]. We can see, that NaviServer processes up to 62k requests, leading to best results with 130 concurrent clients. 1 driverthread: Number Requests per second of ---------------------------- Clients min ave max Time -------- -------- -------- -------- -------------- 10, 41537, 41623, 41712, 00:41:06 20, 50468, 50567, 50643, 00:41:12 30, 50991, 52123, 53439, 00:41:18 40, 55269, 55518, 55906, 00:41:24 50, 56124, 57238, 58526, 00:41:29 60, 57912, 58774, 59471, 00:41:34 70, 59162, 59791, 60676, 00:41:39 80, 60369, 61261, 62273, 00:41:44 90, 60550, 61569, 62234, 00:41:49 100, 61696, 61866, 62142, 00:41:54 110, 61844, 62255, 62809, 00:41:58 120, 61539, 62225, 62890, 00:42:03 130, 62539, 62726, 63007, 00:42:08 140, 61408, 61791, 62364, 00:42:13 150, 60133, 60477, 60863, 00:42:18 160, 57031, 58027, 58556, 00:42:23 170, 56698, 58295, 59615, 00:42:28 180, 56532, 57784, 59081, 00:42:33 190, 56854, 57331, 58030, 00:42:39 200, 57057, 57505, 57741, 00:42:44 210, 56710, 57207, 58001, 00:42:49 220, 55877, 57129, 57857, 00:42:54 230, 56414, 57521, 59239, 00:43:00 240, 55205, 56078, 56871, 00:43:05 250, 55325, 56398, 57146, 00:43:10 When we configure 2 driverthreads, the performance with a few clients is interestingly worse (40k vs. 41k above), the peak is at a similar level, but the performance decrease after the peak is more flat. At 250 clients, we have now 61k with 2 driverthreads instead of 56k with one driverthread. 2 driverthreads: Number Requests per second of ---------------------------- Clients min ave max Time -------- -------- -------- -------- -------------- 10, 40001, 40106, 40222, 00:49:20 20, 49768, 50164, 50510, 00:49:26 30, 52376, 52584, 52805, 00:49:32 40, 56059, 56204, 56308, 00:49:38 50, 56353, 56798, 57104, 00:49:43 60, 58605, 58949, 59283, 00:49:48 70, 59749, 59950, 60209, 00:49:53 80, 61523, 61682, 61992, 00:49:58 90, 62222, 62568, 63056, 00:50:03 100, 62504, 62722, 62932, 00:50:07 110, 62452, 62607, 62729, 00:50:12 120, 62880, 62927, 62968, 00:50:17 130, 62663, 62804, 62883, 00:50:22 140, 62447, 62509, 62563, 00:50:27 150, 61786, 62061, 62408, 00:50:31 160, 60609, 61769, 62419, 00:50:36 170, 60304, 61654, 62613, 00:50:41 180, 60355, 60702, 60965, 00:50:46 190, 60268, 60690, 61153, 00:50:51 200, 61023, 61142, 61233, 00:50:56 210, 61029, 61356, 62003, 00:51:01 220, 60858, 61136, 61596, 00:51:06 230, 61064, 61362, 61580, 00:51:11 240, 60867, 61260, 61530, 00:51:16 250, 61137, 61390, 61645, 00:51:20 The same continues with 10 driverthreads configured. the performance with a few clients is interestingly even worse (new 37k), reaching a peak of ~62k requests per second with more than 200 clients. 10 driverthreads: Number Requests per second of ---------------------------- Clients min ave max Time -------- -------- -------- -------- -------------- 10, 37443, 37657, 37875, 00:37:28 20, 49141, 49492, 49697, 00:37:34 30, 52764, 52887, 53081, 00:37:40 40, 55834, 56531, 57154, 00:37:45 50, 58638, 59156, 59624, 00:37:50 60, 59130, 59391, 59731, 00:37:55 70, 59807, 59920, 60005, 00:38:00 80, 59863, 60001, 60173, 00:38:05 90, 60107, 60167, 60244, 00:38:10 100, 59107, 59718, 60360, 00:38:15 110, 59480, 59927, 60496, 00:38:20 120, 59751, 60105, 60505, 00:38:25 130, 59479, 59635, 59799, 00:38:30 140, 59560, 59974, 60529, 00:38:35 150, 60571, 61301, 61760, 00:38:40 160, 61361, 61620, 61939, 00:38:45 170, 61261, 61821, 62154, 00:38:50 180, 60878, 61285, 61737, 00:38:55 190, 61039, 61553, 61863, 00:39:00 200, 61701, 61841, 62088, 00:39:05 210, 61665, 61933, 62222, 00:39:09 220, 61562, 61669, 61844, 00:39:14 230, 61448, 61634, 61785, 00:39:19 240, 61711, 61963, 62124, 00:39:24 250, 61531, 61789, 62154, 00:39:29 As the following output (produced by the new "ns_driver stats" command) shows, Linux distributes the requests evenly over the driver threads: name nsssl:0 received 0 spooled 0 partial 0 errors 0 name nssock:9 received 782662 spooled 0 partial 1051 errors 0 name nssock:8 received 786790 spooled 0 partial 1033 errors 0 name nssock:7 received 808231 spooled 0 partial 1054 errors 0 name nssock:6 received 813971 spooled 0 partial 1042 errors 0 name nssock:5 received 843588 spooled 0 partial 1070 errors 0 name nssock:4 received 727655 spooled 0 partial 982 errors 0 name nssock:3 received 777413 spooled 0 partial 1066 errors 0 name nssock:2 received 778254 spooled 0 partial 1009 errors 0 name nssock:1 received 824841 spooled 0 partial 1002 errors 0 name nssock:0 received 778979 spooled 0 partial 988 errors 0 Certainly, for most real applications, the benefits will be even less than this. However, more architectures with more cores are coming, so maybe future applications on future hardware can benefit more from this feature. all the best -gn [1] https://lwn.net/Articles/542629/ [2] http://ark.intel.com/products/83356/Intel-Xeon-Processor-E5-2630-v3-20M-Cache-2_40-GHz [3] https://github.com/lighttpd/weighttp ------------------------------------------------------------------------------ _______________________________________________ naviserver-devel mailing list naviserver-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/naviserver-devel