[
https://issues.apache.org/jira/browse/CASSANDRA-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496077#comment-16496077
]
Lerh Chuan Low edited comment on CASSANDRA-10540 at 5/31/18 4:34 AM:
---------------------------------------------------------------------
Hi [~krummas],
Sorry for the delay, here are some initial benchmarks. I've only tried it with
LCS, this is the Stressspec YAML, a reasonably stressful test:
{code:java}
keyspace: stresscql2
keyspace_definition: |
CREATE KEYSPACE stresscql2 WITH replication = {'class':
'NetworkTopologyStrategy', 'Waboku': 3, 'Bokusapp': 2};
table: typestest
table_definition: |
CREATE TABLE typestest (
name text,
choice boolean,
date timestamp,
address inet,
dbl double,
lval bigint,
ival int,
uid timeuuid,
value blob,
PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid)
) WITH compaction = { 'class':'LeveledCompactionStrategy',
'range_aware_compaction':'true', 'min_range_sstable_size_in_mb':'15' }
AND comment='A table of many types to test wide rows'
columnspec:
name: name
size: uniform(1..1000)
population: uniform(1..500M) # the range of unique values to select for the
field (default is 100Billion)
name: date
cluster: uniform(20..1000)
name: lval
population: gaussian(1..1000)
cluster: uniform(1..4)
name: value
size: uniform(100..500)
insert:
partitions: fixed(1) # number of unique partitions to update in a single
operation
batchtype: UNLOGGED # type of batch to use
select: uniform(1..10)/10 # uniform chance any single generated CQL row will be
visited in a partition;
queries:
simple1:
cql: select * from typestest where name = ? and choice = ? LIMIT 1
fields: samerow
range1:
cql: select name, choice, uid from typestest where name = ? and choice = ? and
date >= ? LIMIT 10
fields: multirow
simple2:
cql: select name, choice, uid from typestest where name = ? and choice = ?
LIMIT 1
fields: samerow # samerow or multirow (select arguments from the same row, or
randomly from all rows in the partition)
{code}
This is done over a multi DC cluster in EC2, 400GB SSD with 3 nodes in 1 and 2
nodes in the other. Stress replicates to both DCs.
For inserts:
{code:java}
nohup cassandra-stress user no-warmup profile=stressspec.yaml n=150000000
cl=QUORUM ops(insert=1) -node file=nodelist.txt -rate threads=100 -log
file=insert.log > nohup.txt &{code}
We have
|| ||RACS||NonRACS||
|Stress result|Op rate : 8,784 op/s [insert: 8,784 op/s]
Partition rate : 8,784 pk/s [insert: 8,784 pk/s]
Row rate : 8,784 row/s [insert: 8,784 row/s]
Latency mean : 5.4 ms [insert: 5.4 ms]
Latency median : 4.3 ms [insert: 4.3 ms]
Latency 95th percentile : 8.4 ms [insert: 8.4 ms]
Latency 99th percentile : 39.2 ms [insert: 39.2 ms]
Latency 99.9th percentile : 63.3 ms [insert: 63.3 ms]
Latency max : 1506.8 ms [insert: 1,506.8 ms]
Total partitions : 150,000,000 [insert: 150,000,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 04:44:35|Op rate : 8,730 op/s [insert: 8,730 op/s]
Partition rate : 8,730 pk/s [insert: 8,730 pk/s]
Row rate : 8,730 row/s [insert: 8,730 row/s]
Latency mean : 5.4 ms [insert: 5.4 ms]
Latency median : 4.3 ms [insert: 4.3 ms]
Latency 95th percentile : 8.5 ms [insert: 8.5 ms]
Latency 99th percentile : 39.4 ms [insert: 39.4 ms]
Latency 99.9th percentile : 66.1 ms [insert: 66.1 ms]
Latency max : 944.8 ms [insert: 944.8 ms]
Total partitions : 150,000,000 [insert: 150,000,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 04:46:22|
|SSTable count|1339
1259
1342
1285
1333|743
750
747
737
741|
For mixed workloads, which is done after the insert so reads are not just read
off the OS page cache:
{code:java}
nohup cassandra-stress user no-warmup profile=stressspec.yaml duration=2h
cl=QUORUM ops\(insert=10,simple1=10,range1=1\) -node file=nodelist.txt -rate
threads=50 -log file=mixed.log > nohup.txt &
{code}
|| ||RACS||Non RACS||
|Stress result |Op rate : 415 op/s [insert: 197 op/s, range1: 20 op/s, simple1:
198 op/s]
Partition rate : 407 pk/s [insert: 197 pk/s, range1: 12 pk/s, simple1: 198
pk/s]
Row rate : 412 row/s [insert: 197 row/s, range1: 17 row/s, simple1: 198 row/s]
Latency mean : 120.4 ms [insert: 2.3 ms, range1: 227.0 ms, simple1: 227.3 ms]
Latency median : 38.0 ms [insert: 2.0 ms, range1: 207.0 ms, simple1: 207.4 ms]
Latency 95th percentile : 454.6 ms [insert: 3.1 ms, range1: 541.1 ms, simple1:
543.2 ms]
Latency 99th percentile : 673.2 ms [insert: 5.1 ms, range1: 739.2 ms, simple1:
741.3 ms]
Latency 99.9th percentile : 918.0 ms [insert: 43.4 ms, range1: 985.1 ms,
simple1: 975.2 ms]
Latency max : 1584.4 ms [insert: 766.0 ms, range1: 1,426.1 ms, simple1:
1,584.4 ms]
Total partitions : 2,930,512 [insert: 1,419,222, range1: 86,021, simple1:
1,425,269]
Total errors : 0 [insert: 0, range1: 0, simple1: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 02:00:01|Op rate : 382 op/s [insert: 182 op/s, range1:
18 op/s, simple1: 182 op/s]
Partition rate : 375 pk/s [insert: 182 pk/s, range1: 11 pk/s, simple1: 182
pk/s]
Row rate : 379 row/s [insert: 182 row/s, range1: 15 row/s, simple1: 182 row/s]
Latency mean : 130.8 ms [insert: 2.6 ms, range1: 247.8 ms, simple1: 247.3 ms]
Latency median : 39.0 ms [insert: 2.0 ms, range1: 229.9 ms, simple1: 229.0 ms]
Latency 95th percentile : 480.8 ms [insert: 3.5 ms, range1: 567.8 ms, simple1:
568.3 ms]
Latency 99th percentile : 682.6 ms [insert: 22.1 ms, range1: 752.4 ms,
simple1: 753.4 ms]
Latency 99.9th percentile : 907.0 ms [insert: 51.3 ms, range1: 966.8 ms,
simple1: 966.8 ms]
Latency max : 1493.2 ms [insert: 695.2 ms, range1: 1,141.9 ms, simple1:
1,493.2 ms]
Total partitions : 2,698,021 [insert: 1,310,787, range1: 78,296, simple1:
1,308,938]
Total errors : 0 [insert: 0, range1: 0, simple1: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 02:00:01|
|SSTable count|1342
1262
1345
1288
1337|746
753
749
742
745|
I did wanted to test stress performance while running repair as well because we
believe that RACS will bring a lot of benefit to repair, but so far 3 out of 3
tries I haven't managed to get repair working, it looks like I've been running
into this bug https://issues.apache.org/jira/browse/CASSANDRA-13938.
Let me know if you would like any more benchmarks? I've got the terraform
scripts to set up multi DC so it should be reasonably straightforward to test
more (or if you would like a different form of test).
was (Author: lerh low):
Hi [~krummas],
Sorry for the delay, here are some initial benchmarks. I've only tried it with
LCS, this is the Stressspec YAML, a reasonably stressful test:
```
keyspace: stresscql2
keyspace_definition: |
CREATE KEYSPACE stresscql2 WITH replication = \{'class':
'NetworkTopologyStrategy', 'Waboku': 3, 'Bokusapp': 2};
table: typestest
table_definition: |
CREATE TABLE typestest (
name text,
choice boolean,
date timestamp,
address inet,
dbl double,
lval bigint,
ival int,
uid timeuuid,
value blob,
PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid)
) WITH compaction = \{ 'class':'LeveledCompactionStrategy',
'range_aware_compaction':'true', 'min_range_sstable_size_in_mb':'15' }
AND comment='A table of many types to test wide rows'
columnspec:
- name: name
size: uniform(1..1000)
population: uniform(1..500M) # the range of unique values to select for the
field (default is 100Billion)
- name: date
cluster: uniform(20..1000)
- name: lval
population: gaussian(1..1000)
cluster: uniform(1..4)
- name: value
size: uniform(100..500)
insert:
partitions: fixed(1) # number of unique partitions to update in a single
operation
batchtype: UNLOGGED # type of batch to use
select: uniform(1..10)/10 # uniform chance any single generated CQL row will
be visited in a partition;
queries:
simple1:
cql: select * from typestest where name = ? and choice = ? LIMIT 1
fields: samerow
range1:
cql: select name, choice, uid from typestest where name = ? and choice = ? and
date >= ? LIMIT 10
fields: multirow
simple2:
cql: select name, choice, uid from typestest where name = ? and choice = ?
LIMIT 1
fields: samerow # samerow or multirow (select arguments from the same row, or
randomly from all rows in the partition)
```
This is done over a multi DC cluster in EC2, 400GB SSD with 3 nodes in 1 and 2
nodes in the other. Stress replicates to both DCs.
For inserts:
```
nohup cassandra-stress user no-warmup profile=stressspec.yaml n=150000000
cl=QUORUM ops\(insert=1\) -node file=nodelist.txt -rate threads=100 -log
file=insert.log > nohup.txt &
```
We have
|| ||RACS||NonRACS||
|Stress result|Op rate : 8,784 op/s [insert: 8,784 op/s]
Partition rate : 8,784 pk/s [insert: 8,784 pk/s]
Row rate : 8,784 row/s [insert: 8,784 row/s]
Latency mean : 5.4 ms [insert: 5.4 ms]
Latency median : 4.3 ms [insert: 4.3 ms]
Latency 95th percentile : 8.4 ms [insert: 8.4 ms]
Latency 99th percentile : 39.2 ms [insert: 39.2 ms]
Latency 99.9th percentile : 63.3 ms [insert: 63.3 ms]
Latency max : 1506.8 ms [insert: 1,506.8 ms]
Total partitions : 150,000,000 [insert: 150,000,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 04:44:35|Op rate : 8,730 op/s [insert: 8,730 op/s]
Partition rate : 8,730 pk/s [insert: 8,730 pk/s]
Row rate : 8,730 row/s [insert: 8,730 row/s]
Latency mean : 5.4 ms [insert: 5.4 ms]
Latency median : 4.3 ms [insert: 4.3 ms]
Latency 95th percentile : 8.5 ms [insert: 8.5 ms]
Latency 99th percentile : 39.4 ms [insert: 39.4 ms]
Latency 99.9th percentile : 66.1 ms [insert: 66.1 ms]
Latency max : 944.8 ms [insert: 944.8 ms]
Total partitions : 150,000,000 [insert: 150,000,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 04:46:22|
|SSTable count|1339
1259
1342
1285
1333|743
750
747
737
741|
For mixed workloads, which is done after the insert so reads are not just read
off the OS page cache:
|| ||RACS||Non RACS||
|Stress result |Op rate : 415 op/s [insert: 197 op/s, range1: 20 op/s, simple1:
198 op/s]
Partition rate : 407 pk/s [insert: 197 pk/s, range1: 12 pk/s, simple1: 198 pk/s]
Row rate : 412 row/s [insert: 197 row/s, range1: 17 row/s, simple1: 198 row/s]
Latency mean : 120.4 ms [insert: 2.3 ms, range1: 227.0 ms, simple1: 227.3 ms]
Latency median : 38.0 ms [insert: 2.0 ms, range1: 207.0 ms, simple1: 207.4 ms]
Latency 95th percentile : 454.6 ms [insert: 3.1 ms, range1: 541.1 ms, simple1:
543.2 ms]
Latency 99th percentile : 673.2 ms [insert: 5.1 ms, range1: 739.2 ms, simple1:
741.3 ms]
Latency 99.9th percentile : 918.0 ms [insert: 43.4 ms, range1: 985.1 ms,
simple1: 975.2 ms]
Latency max : 1584.4 ms [insert: 766.0 ms, range1: 1,426.1 ms, simple1: 1,584.4
ms]
Total partitions : 2,930,512 [insert: 1,419,222, range1: 86,021, simple1:
1,425,269]
Total errors : 0 [insert: 0, range1: 0, simple1: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 02:00:01|Op rate : 382 op/s [insert: 182 op/s, range1:
18 op/s, simple1: 182 op/s]
Partition rate : 375 pk/s [insert: 182 pk/s, range1: 11 pk/s, simple1: 182 pk/s]
Row rate : 379 row/s [insert: 182 row/s, range1: 15 row/s, simple1: 182 row/s]
Latency mean : 130.8 ms [insert: 2.6 ms, range1: 247.8 ms, simple1: 247.3 ms]
Latency median : 39.0 ms [insert: 2.0 ms, range1: 229.9 ms, simple1: 229.0 ms]
Latency 95th percentile : 480.8 ms [insert: 3.5 ms, range1: 567.8 ms, simple1:
568.3 ms]
Latency 99th percentile : 682.6 ms [insert: 22.1 ms, range1: 752.4 ms, simple1:
753.4 ms]
Latency 99.9th percentile : 907.0 ms [insert: 51.3 ms, range1: 966.8 ms,
simple1: 966.8 ms]
Latency max : 1493.2 ms [insert: 695.2 ms, range1: 1,141.9 ms, simple1: 1,493.2
ms]
Total partitions : 2,698,021 [insert: 1,310,787, range1: 78,296, simple1:
1,308,938]
Total errors : 0 [insert: 0, range1: 0, simple1: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 02:00:01|
|SSTable count|1342
1262
1345
1288
1337|746
753
749
742
745|
I did wanted to test stress performance while running repair as well because we
believe that RACS will bring a lot of benefit to repair, but so far 3 out of 3
tries I haven't managed to get repair working, it looks like I've been running
into this bug https://issues.apache.org/jira/browse/CASSANDRA-13938.
Let me know if you would like any more benchmarks? I've got the terraform
scripts to set up multi DC so it should be reasonably straightforward to test
more (or if you would like a different form of test).
> RangeAwareCompaction
> --------------------
>
> Key: CASSANDRA-10540
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10540
> Project: Cassandra
> Issue Type: New Feature
> Reporter: Marcus Eriksson
> Assignee: Marcus Eriksson
> Priority: Major
> Labels: compaction, lcs, vnodes
> Fix For: 4.x
>
>
> Broken out from CASSANDRA-6696, we should split sstables based on ranges
> during compaction.
> Requirements;
> * dont create tiny sstables - keep them bunched together until a single vnode
> is big enough (configurable how big that is)
> * make it possible to run existing compaction strategies on the per-range
> sstables
> We should probably add a global compaction strategy parameter that states
> whether this should be enabled or not.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]