[
https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14249085#comment-14249085
]
Ariel Weisberg commented on CASSANDRA-8457:
-------------------------------------------
I have some code and results. https://github.com/aweisberg/cassandra/tree/C-8457
I tested on AWS using a 3 node cluster of c3.8xlarge instances in the same
placement group using HVM with Ubuntu 14.04. Other then
/etc/security/limits.conf I made no changes to the install which was the
Rightscale ServerTemplate Base ServerTemplate for Linux (v14.1.0).
Config provided to cstar bootstrap was
{code:JavaScript}
{
"revision": "aweisberg/C-8457",
"label": "test",
"yaml": "key_cache_size_in_mb: 256\nrow_cache_size_in_mb:
2000\ncommitlog_sync: periodic\ncommitlog_sync_batch_window_in_ms:
null\ncommitlog_sync_period_in_ms: 10000\ncompaction_throughput_mb_per_sec:
0\nconcurrent_compactors: 4",
"env": "MAX_HEAP_SIZE=8g\nHEAP_NEWSIZE=2g",
"options": {
"use_vnodes": true
}
}
{
"commitlog_directory": "/mnt/ephemeral/commitlog",
"data_file_directories": [
"/mnt/ephemeral/datadir"
],
"block_devices": [
"/dev/mapper/vg--data-ephemeral0"
],
"blockdev_readahead": "128",
"hosts": {
"ec2-54-175-1-84.compute-1.amazonaws.com": {
"internal_ip": "172.31.49.199",
"hostname": "ec2-54-175-1-84.compute-1.amazonaws.com",
"seed": true
},
"ec2-54-175-32-238.compute-1.amazonaws.com": {
"internal_ip": "172.31.53.77",
"hostname": "ec2-54-175-32-238.compute-1.amazonaws.com",
"seed": true
},
"ec2-54-175-32-206.compute-1.amazonaws.com": {
"internal_ip": "172.31.57.63",
"hostname": "ec2-54-175-32-206.compute-1.amazonaws.com",
"seed": true
}
},
"user": "ariel_weisberg",
"name": "example1",
"saved_caches_directory": "/mnt/ephemeral/caches"
}
{code}
To populate data I used
bq. ./cassandra-stress write n=100000 -pop seq=1...100000 no-wrap -rate
threads=50 -col 'n=fixed(1)' -schema 'replication(factor=3)' -node
file=$HOME/hosts
To read I used
bq. ./cassandra-stress read n=10000000 cl=ALL -pop 'dist=UNIFORM(1...100000)'
-rate threads=200 -col 'n=fixed(1)' -schema 'replication(factor=3)' -node
file=~/hosts
I ran two client instances on two nodes (one per node) also on c3.8xlarge in
the same placement group.
Unmodified trunk
{noformat}
op rate : 87497
partition rate : 87497
row rate : 87497
latency mean : 2.3
latency median : 2.1
latency 95th percentile : 3.2
latency 99th percentile : 3.7
latency 99.9th percentile : 4.5
latency max : 124.0
total gc count : 28
total gc mb : 44299
total gc time (s) : 1
avg gc time(ms) : 21
stdev gc time(ms) : 17
Total operation time : 00:01:54
END
op rate : 87598
partition rate : 87598
row rate : 87598
latency mean : 2.3
latency median : 2.1
latency 95th percentile : 3.2
latency 99th percentile : 3.8
latency 99.9th percentile : 4.4
latency max : 124.8
total gc count : 133
total gc mb : 211358
total gc time (s) : 3
avg gc time(ms) : 20
stdev gc time(ms) : 17
Total operation time : 00:01:54
END
{noformat}
Modified
{noformat}
Results:
op rate : 87476
partition rate : 87476
row rate : 87476
latency mean : 2.3
latency median : 2.1
latency 95th percentile : 3.2
latency 99th percentile : 3.7
latency 99.9th percentile : 4.0
latency max : 130.2
total gc count : 102
total gc mb : 165487
total gc time (s) : 3
avg gc time(ms) : 25
stdev gc time(ms) : 21
Total operation time : 00:01:54
END
Results:
op rate : 87347
partition rate : 87347
row rate : 87347
latency mean : 2.3
latency median : 2.1
latency 95th percentile : 3.1
latency 99th percentile : 3.6
latency 99.9th percentile : 3.9
latency max : 129.2
total gc count : 59
total gc mb : 93416
total gc time (s) : 1
avg gc time(ms) : 23
stdev gc time(ms) : 16
Total operation time : 00:01:54
END
{noformat}
[~benedict] Can you look at the code and the stress params and validate that
you think I am measuring what I think I am measuring?
I am going to profile the client and server tomorrow to get my bearings on what
executing this workload actually looks like.
> nio MessagingService
> --------------------
>
> Key: CASSANDRA-8457
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
> Project: Cassandra
> Issue Type: New Feature
> Components: Core
> Reporter: Jonathan Ellis
> Assignee: Ariel Weisberg
> Labels: performance
> Fix For: 3.0
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big
> contributor to context switching, especially for larger clusters. Let's look
> at switching to nio, possibly via Netty.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)