[
https://issues.apache.org/jira/browse/CASSANDRA-7761?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14155426#comment-14155426
]
T Jake Luciani edited comment on CASSANDRA-7761 at 10/1/14 8:20 PM:
--------------------------------------------------------------------
I've done some work to further optimize our netty server and gotten some
significant gains.
https://github.com/tjake/cassandra/tree/netty-perf
The primary change is to avoid using a separate thread pool for the dispatch
step and re-use the nio threads. This cuts a large amount of latency from the
request as you can see from the runs below (4,8,16 threads, later C* becomes
the bottleneck). It also cuts a large amount of cpu and thread switching.
Also switched on epoll.
For a purely in cached response I can now get > 100k requests/sec on my laptop
vs 80k previously
Sharing the thread pool also let's netty naturally batch responses, so we no
longer need CASSANDRA-5663
I need to do further testing on our cstar clusters but initial results look
promising and the code is cleaner.
{code}
--Current trunk--
./bin/cassandra -f 152.08s user 25.14s system 173% cpu 1:42.06 total
Results:
op rate : 45473
partition rate : 45473
row rate : 45473
latency mean : 20.2
latency median : 16.9
latency 95th percentile : 48.4
latency 99th percentile : 75.2
latency 99.9th percentile : 116.4
latency max : 352.5
total gc count : 39
total gc mb : 12477
total gc time (s) : 2
avg gc time(ms) : 43
stdev gc time(ms) : 10
Total operation time : 00:00:33
Improvement over 609 threadCount: 3%
id, total ops , adj row/s, op/s, pk/s, row/s, mean,
med, .95, .99, .999, max, time, stderr, gc: #, max ms,
sum ms, sdv ms, mb
4 threadCount, 372704 , -0, 12150, 12150, 12150, 0.3,
0.3, 0.5, 0.9, 4.0, 44.7, 30.7, 0.01078, 20, 625,
625, 5, 6352
8 threadCount, 566480 , 18307, 18289, 18289, 18289, 0.4,
0.4, 0.7, 1.3, 5.6, 56.9, 31.0, 0.01124, 23, 781,
781, 6, 7320
16 threadCount, 771731 , 24763, 24739, 24739, 24739, 0.6,
0.5, 1.2, 2.5, 18.3, 66.5, 31.2, 0.01758, 25, 885,
885, 7, 7980
24 threadCount, 916588 , 29341, 29312, 29312, 29312, 0.8,
0.6, 1.6, 3.7, 12.3, 52.5, 31.3, 0.01256, 26, 899,
899, 6, 8308
36 threadCount, 1039678 , 33081, 33068, 33068, 33068, 1.1,
0.8, 2.2, 5.9, 23.5, 59.2, 31.4, 0.00985, 29, 986,
986, 7, 9271
54 threadCount, 1123610 , 35823, 35780, 35780, 35780, 1.5,
1.1, 3.2, 8.9, 36.1, 83.6, 31.4, 0.02015, 30, 1104,
1169, 11, 9581
81 threadCount, 1185809 , -0, 37260, 37260, 37260, 2.2,
1.6, 4.7, 13.0, 44.0, 300.7, 31.8, 0.01640, 32, 1074,
1169, 8, 10235
121 threadCount, 1275470 , -0, 40124, 40124, 40124, 3.0,
2.4, 6.3, 14.7, 43.7, 71.0, 31.8, 0.01488, 33, 1053,
1152, 7, 10556
181 threadCount, 1326379 , 41472, 41413, 41413, 41413, 4.4,
3.5, 9.3, 22.1, 51.4, 84.0, 32.0, 0.01061, 34, 1116,
1277, 7, 10876
271 threadCount, 1340955 , 41138, 41060, 41060, 41060, 6.6,
4.9, 17.8, 36.3, 63.2, 125.8, 32.7, 0.01902, 35, 1234,
1375, 11, 11191
406 threadCount, 1418529 , 43033, 42978, 42978, 42978, 9.4,
7.3, 24.3, 44.2, 81.3, 299.8, 33.0, 0.01606, 36, 1172,
1432, 9, 11517
609 threadCount, 1465946 , 44234, 44159, 44159, 44159, 13.8,
11.6, 33.5, 53.3, 94.5, 135.8, 33.2, 0.01105, 37, 1183,
1428, 9, 11837
913 threadCount, 1544584 , 45547, 45473, 45473, 45473, 20.2,
16.9, 48.4, 75.2, 116.4, 352.5, 34.0, 0.00953, 39, 1324,
1663, 10, 12477
END
--Netty fixes--
./bin/cassandra -f 110.27s user 13.83s system 116% cpu 1:46.80 total
Results:
op rate : 45506
partition rate : 45506
row rate : 45506
latency mean : 20.1
latency median : 17.3
latency 95th percentile : 40.6
latency 99th percentile : 69.4
latency 99.9th percentile : 148.4
latency max : 261.7
total gc count : 38
total gc mb : 12154
total gc time (s) : 2
avg gc time(ms) : 40
stdev gc time(ms) : 10
Total operation time : 00:00:33
Improvement over 609 threadCount: 2%
id, total ops , adj row/s, op/s, pk/s, row/s, mean,
med, .95, .99, .999, max, time, stderr, gc: #, max ms,
sum ms, sdv ms, mb
4 threadCount, 549137 , -0, 17810, 17810, 17810, 0.2,
0.2, 0.3, 0.7, 4.0, 50.8, 30.8, 0.01549, 26, 843,
843, 5, 8259
8 threadCount, 712047 , 22991, 22988, 22988, 22988, 0.3,
0.3, 0.5, 1.1, 5.0, 75.6, 31.0, 0.01156, 24, 896,
896, 10, 7643
16 threadCount, 854794 , 27261, 27261, 27261, 27261, 0.6,
0.5, 0.8, 1.6, 6.7, 675.7, 31.4, 0.00871, 25, 865,
865, 6, 7983
24 threadCount, 937211 , 30181, 30151, 30151, 30151, 0.8,
0.7, 1.1, 2.1, 9.9, 63.7, 31.1, 0.00833, 25, 897,
897, 9, 7990
36 threadCount, 991671 , 31797, 31791, 31791, 31791, 1.1,
1.0, 1.6, 2.8, 30.6, 60.6, 31.2, 0.00628, 26, 860,
860, 4, 8313
54 threadCount, 1004934 , 32155, 32124, 32124, 32124, 1.7,
1.5, 2.3, 4.2, 33.4, 82.7, 31.3, 0.01421, 26, 1019,
1019, 13, 8303
81 threadCount, 1060294 , -0, 33734, 33734, 33734, 2.4,
2.2, 3.3, 5.3, 34.1, 54.7, 31.4, 0.00888, 27, 866,
866, 4, 8636
121 threadCount, 1065139 , -0, 33931, 33931, 33931, 3.6,
3.3, 4.9, 7.6, 38.0, 83.7, 31.4, 0.00646, 27, 954,
954, 10, 8636
181 threadCount, 1058899 , -0, 33635, 33635, 33635, 5.4,
4.9, 7.4, 11.9, 49.0, 626.1, 31.5, 0.01520, 27, 944,
944, 8, 8637
271 threadCount, 1219920 , -0, 38376, 38376, 38376, 7.1,
6.1, 11.0, 25.6, 74.0, 532.4, 31.8, 0.01698, 31, 1198,
1296, 16, 9904
406 threadCount, 1438641 , 44469, 44426, 44426, 44426, 9.1,
8.0, 14.7, 39.7, 76.4, 595.8, 32.4, 0.01055, 36, 1151,
1385, 10, 11504
609 threadCount, 1502339 , 44705, 44705, 44705, 44705, 13.6,
11.6, 27.6, 62.1, 105.3, 156.2, 33.6, 0.01404, 38, 1325,
1625, 12, 12095
913 threadCount, 1529752 , -0, 45506, 45506, 45506, 20.1,
17.3, 40.6, 69.4, 148.4, 261.7, 33.6, 0.01656, 38, 1221,
1509, 10, 12154
{code}
was (Author: tjake):
I've done some work to further optimize our netty server and gotten some
significant gains.
https://github.com/tjake/cassandra/tree/netty-perf
The primary change is to avoid using a separate thread pool for the dispatch
step and re-use the nio threads. This cuts a large amount of latency from the
request as you can see from the runs below (4,8,16 threads, later C* becomes
the bottleneck). It also cuts a large amount of cpu and thread switching.
Also switched on epoll.
Sharing the thread pool also let's netty naturally batch responses, so we no
longer need CASSANDRA-5663
I need to do further testing on our cstar clusters but initial results look
promising and the code is cleaner.
{code}
--Current trunk--
./bin/cassandra -f 152.08s user 25.14s system 173% cpu 1:42.06 total
Results:
op rate : 45473
partition rate : 45473
row rate : 45473
latency mean : 20.2
latency median : 16.9
latency 95th percentile : 48.4
latency 99th percentile : 75.2
latency 99.9th percentile : 116.4
latency max : 352.5
total gc count : 39
total gc mb : 12477
total gc time (s) : 2
avg gc time(ms) : 43
stdev gc time(ms) : 10
Total operation time : 00:00:33
Improvement over 609 threadCount: 3%
id, total ops , adj row/s, op/s, pk/s, row/s, mean,
med, .95, .99, .999, max, time, stderr, gc: #, max ms,
sum ms, sdv ms, mb
4 threadCount, 372704 , -0, 12150, 12150, 12150, 0.3,
0.3, 0.5, 0.9, 4.0, 44.7, 30.7, 0.01078, 20, 625,
625, 5, 6352
8 threadCount, 566480 , 18307, 18289, 18289, 18289, 0.4,
0.4, 0.7, 1.3, 5.6, 56.9, 31.0, 0.01124, 23, 781,
781, 6, 7320
16 threadCount, 771731 , 24763, 24739, 24739, 24739, 0.6,
0.5, 1.2, 2.5, 18.3, 66.5, 31.2, 0.01758, 25, 885,
885, 7, 7980
24 threadCount, 916588 , 29341, 29312, 29312, 29312, 0.8,
0.6, 1.6, 3.7, 12.3, 52.5, 31.3, 0.01256, 26, 899,
899, 6, 8308
36 threadCount, 1039678 , 33081, 33068, 33068, 33068, 1.1,
0.8, 2.2, 5.9, 23.5, 59.2, 31.4, 0.00985, 29, 986,
986, 7, 9271
54 threadCount, 1123610 , 35823, 35780, 35780, 35780, 1.5,
1.1, 3.2, 8.9, 36.1, 83.6, 31.4, 0.02015, 30, 1104,
1169, 11, 9581
81 threadCount, 1185809 , -0, 37260, 37260, 37260, 2.2,
1.6, 4.7, 13.0, 44.0, 300.7, 31.8, 0.01640, 32, 1074,
1169, 8, 10235
121 threadCount, 1275470 , -0, 40124, 40124, 40124, 3.0,
2.4, 6.3, 14.7, 43.7, 71.0, 31.8, 0.01488, 33, 1053,
1152, 7, 10556
181 threadCount, 1326379 , 41472, 41413, 41413, 41413, 4.4,
3.5, 9.3, 22.1, 51.4, 84.0, 32.0, 0.01061, 34, 1116,
1277, 7, 10876
271 threadCount, 1340955 , 41138, 41060, 41060, 41060, 6.6,
4.9, 17.8, 36.3, 63.2, 125.8, 32.7, 0.01902, 35, 1234,
1375, 11, 11191
406 threadCount, 1418529 , 43033, 42978, 42978, 42978, 9.4,
7.3, 24.3, 44.2, 81.3, 299.8, 33.0, 0.01606, 36, 1172,
1432, 9, 11517
609 threadCount, 1465946 , 44234, 44159, 44159, 44159, 13.8,
11.6, 33.5, 53.3, 94.5, 135.8, 33.2, 0.01105, 37, 1183,
1428, 9, 11837
913 threadCount, 1544584 , 45547, 45473, 45473, 45473, 20.2,
16.9, 48.4, 75.2, 116.4, 352.5, 34.0, 0.00953, 39, 1324,
1663, 10, 12477
END
--Netty fixes--
./bin/cassandra -f 110.27s user 13.83s system 116% cpu 1:46.80 total
Results:
op rate : 45506
partition rate : 45506
row rate : 45506
latency mean : 20.1
latency median : 17.3
latency 95th percentile : 40.6
latency 99th percentile : 69.4
latency 99.9th percentile : 148.4
latency max : 261.7
total gc count : 38
total gc mb : 12154
total gc time (s) : 2
avg gc time(ms) : 40
stdev gc time(ms) : 10
Total operation time : 00:00:33
Improvement over 609 threadCount: 2%
id, total ops , adj row/s, op/s, pk/s, row/s, mean,
med, .95, .99, .999, max, time, stderr, gc: #, max ms,
sum ms, sdv ms, mb
4 threadCount, 549137 , -0, 17810, 17810, 17810, 0.2,
0.2, 0.3, 0.7, 4.0, 50.8, 30.8, 0.01549, 26, 843,
843, 5, 8259
8 threadCount, 712047 , 22991, 22988, 22988, 22988, 0.3,
0.3, 0.5, 1.1, 5.0, 75.6, 31.0, 0.01156, 24, 896,
896, 10, 7643
16 threadCount, 854794 , 27261, 27261, 27261, 27261, 0.6,
0.5, 0.8, 1.6, 6.7, 675.7, 31.4, 0.00871, 25, 865,
865, 6, 7983
24 threadCount, 937211 , 30181, 30151, 30151, 30151, 0.8,
0.7, 1.1, 2.1, 9.9, 63.7, 31.1, 0.00833, 25, 897,
897, 9, 7990
36 threadCount, 991671 , 31797, 31791, 31791, 31791, 1.1,
1.0, 1.6, 2.8, 30.6, 60.6, 31.2, 0.00628, 26, 860,
860, 4, 8313
54 threadCount, 1004934 , 32155, 32124, 32124, 32124, 1.7,
1.5, 2.3, 4.2, 33.4, 82.7, 31.3, 0.01421, 26, 1019,
1019, 13, 8303
81 threadCount, 1060294 , -0, 33734, 33734, 33734, 2.4,
2.2, 3.3, 5.3, 34.1, 54.7, 31.4, 0.00888, 27, 866,
866, 4, 8636
121 threadCount, 1065139 , -0, 33931, 33931, 33931, 3.6,
3.3, 4.9, 7.6, 38.0, 83.7, 31.4, 0.00646, 27, 954,
954, 10, 8636
181 threadCount, 1058899 , -0, 33635, 33635, 33635, 5.4,
4.9, 7.4, 11.9, 49.0, 626.1, 31.5, 0.01520, 27, 944,
944, 8, 8637
271 threadCount, 1219920 , -0, 38376, 38376, 38376, 7.1,
6.1, 11.0, 25.6, 74.0, 532.4, 31.8, 0.01698, 31, 1198,
1296, 16, 9904
406 threadCount, 1438641 , 44469, 44426, 44426, 44426, 9.1,
8.0, 14.7, 39.7, 76.4, 595.8, 32.4, 0.01055, 36, 1151,
1385, 10, 11504
609 threadCount, 1502339 , 44705, 44705, 44705, 44705, 13.6,
11.6, 27.6, 62.1, 105.3, 156.2, 33.6, 0.01404, 38, 1325,
1625, 12, 12095
913 threadCount, 1529752 , -0, 45506, 45506, 45506, 20.1,
17.3, 40.6, 69.4, 148.4, 261.7, 33.6, 0.01656, 38, 1221,
1509, 10, 12154
{code}
> Upgrade netty
> -------------
>
> Key: CASSANDRA-7761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7761
> Project: Cassandra
> Issue Type: Improvement
> Reporter: T Jake Luciani
> Assignee: T Jake Luciani
> Priority: Minor
> Fix For: 2.1.1
>
>
> Latest netty contains the proper fix for CASSANDRA-7695 plus some of the
> performance patches [~benedict] contributed. We should upgrade to this
> following extensive burn in testing.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)