[jira] [Comment Edited] (CASSANDRA-10540) RangeAwareCompaction

Lerh Chuan Low (JIRA) Wed, 30 May 2018 21:35:35 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496077#comment-16496077
 ]


Lerh Chuan Low edited comment on CASSANDRA-10540 at 5/31/18 4:34 AM:
---------------------------------------------------------------------

Hi [~krummas], 

Sorry for the delay, here are some initial benchmarks. I've only tried it with 
LCS, this is the Stressspec YAML, a reasonably stressful test:
{code:java}
keyspace: stresscql2

keyspace_definition: |
CREATE KEYSPACE stresscql2 WITH replication = {'class': 
'NetworkTopologyStrategy', 'Waboku': 3, 'Bokusapp': 2};

table: typestest

table_definition: |
CREATE TABLE typestest (
name text,
choice boolean,
date timestamp,
address inet,
dbl double,
lval bigint,
ival int,
uid timeuuid,
value blob,
PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid)
) WITH compaction = { 'class':'LeveledCompactionStrategy', 
'range_aware_compaction':'true', 'min_range_sstable_size_in_mb':'15' }
AND comment='A table of many types to test wide rows'

columnspec:

name: name
size: uniform(1..1000)
population: uniform(1..500M) # the range of unique values to select for the 
field (default is 100Billion)
name: date
cluster: uniform(20..1000)
name: lval
population: gaussian(1..1000)
cluster: uniform(1..4)
name: value
size: uniform(100..500)
insert:
partitions: fixed(1) # number of unique partitions to update in a single 
operation
batchtype: UNLOGGED # type of batch to use
select: uniform(1..10)/10 # uniform chance any single generated CQL row will be 
visited in a partition;

queries:
simple1:
cql: select * from typestest where name = ? and choice = ? LIMIT 1
fields: samerow 
range1:
cql: select name, choice, uid from typestest where name = ? and choice = ? and 
date >= ? LIMIT 10
fields: multirow 
simple2:
cql: select name, choice, uid from typestest where name = ? and choice = ? 
LIMIT 1
fields: samerow # samerow or multirow (select arguments from the same row, or 
randomly from all rows in the partition)
{code}
This is done over a multi DC cluster in EC2, 400GB SSD with 3 nodes in 1 and 2 
nodes in the other. Stress replicates to both DCs. 

For inserts:
{code:java}
nohup cassandra-stress user no-warmup profile=stressspec.yaml n=150000000 
cl=QUORUM ops(insert=1) -node file=nodelist.txt -rate threads=100 -log 
file=insert.log > nohup.txt &{code}
We have
|| ||RACS||NonRACS||
|Stress result|Op rate : 8,784 op/s [insert: 8,784 op/s]
 Partition rate : 8,784 pk/s [insert: 8,784 pk/s]
 Row rate : 8,784 row/s [insert: 8,784 row/s]
 Latency mean : 5.4 ms [insert: 5.4 ms]
 Latency median : 4.3 ms [insert: 4.3 ms]
 Latency 95th percentile : 8.4 ms [insert: 8.4 ms]
 Latency 99th percentile : 39.2 ms [insert: 39.2 ms]
 Latency 99.9th percentile : 63.3 ms [insert: 63.3 ms]
 Latency max : 1506.8 ms [insert: 1,506.8 ms]
 Total partitions : 150,000,000 [insert: 150,000,000]
 Total errors : 0 [insert: 0]
 Total GC count : 0
 Total GC memory : 0.000 KiB
 Total GC time : 0.0 seconds
 Avg GC time : NaN ms
 StdDev GC time : 0.0 ms
 Total operation time : 04:44:35|Op rate : 8,730 op/s [insert: 8,730 op/s]
 Partition rate : 8,730 pk/s [insert: 8,730 pk/s]
 Row rate : 8,730 row/s [insert: 8,730 row/s]
 Latency mean : 5.4 ms [insert: 5.4 ms]
 Latency median : 4.3 ms [insert: 4.3 ms]
 Latency 95th percentile : 8.5 ms [insert: 8.5 ms]
 Latency 99th percentile : 39.4 ms [insert: 39.4 ms]
 Latency 99.9th percentile : 66.1 ms [insert: 66.1 ms]
 Latency max : 944.8 ms [insert: 944.8 ms]
 Total partitions : 150,000,000 [insert: 150,000,000]
 Total errors : 0 [insert: 0]
 Total GC count : 0
 Total GC memory : 0.000 KiB
 Total GC time : 0.0 seconds
 Avg GC time : NaN ms
 StdDev GC time : 0.0 ms
 Total operation time : 04:46:22|
|SSTable count|1339
 1259
 1342
 1285
 1333|743
 750
 747
 737
 741|

For mixed workloads, which is done after the insert so reads are not just read 
off the OS page cache:
{code:java}
nohup cassandra-stress user no-warmup profile=stressspec.yaml duration=2h 
cl=QUORUM ops\(insert=10,simple1=10,range1=1\) -node file=nodelist.txt -rate 
threads=50 -log file=mixed.log > nohup.txt &
{code}
|| ||RACS||Non RACS||
|Stress result |Op rate : 415 op/s [insert: 197 op/s, range1: 20 op/s, simple1: 
198 op/s]
 Partition rate : 407 pk/s [insert: 197 pk/s, range1: 12 pk/s, simple1: 198 
pk/s]
 Row rate : 412 row/s [insert: 197 row/s, range1: 17 row/s, simple1: 198 row/s]
 Latency mean : 120.4 ms [insert: 2.3 ms, range1: 227.0 ms, simple1: 227.3 ms]
 Latency median : 38.0 ms [insert: 2.0 ms, range1: 207.0 ms, simple1: 207.4 ms]
 Latency 95th percentile : 454.6 ms [insert: 3.1 ms, range1: 541.1 ms, simple1: 
543.2 ms]
 Latency 99th percentile : 673.2 ms [insert: 5.1 ms, range1: 739.2 ms, simple1: 
741.3 ms]
 Latency 99.9th percentile : 918.0 ms [insert: 43.4 ms, range1: 985.1 ms, 
simple1: 975.2 ms]
 Latency max : 1584.4 ms [insert: 766.0 ms, range1: 1,426.1 ms, simple1: 
1,584.4 ms]
 Total partitions : 2,930,512 [insert: 1,419,222, range1: 86,021, simple1: 
1,425,269]
 Total errors : 0 [insert: 0, range1: 0, simple1: 0]
 Total GC count : 0
 Total GC memory : 0.000 KiB
 Total GC time : 0.0 seconds
 Avg GC time : NaN ms
 StdDev GC time : 0.0 ms
 Total operation time : 02:00:01|Op rate : 382 op/s [insert: 182 op/s, range1: 
18 op/s, simple1: 182 op/s]
 Partition rate : 375 pk/s [insert: 182 pk/s, range1: 11 pk/s, simple1: 182 
pk/s]
 Row rate : 379 row/s [insert: 182 row/s, range1: 15 row/s, simple1: 182 row/s]
 Latency mean : 130.8 ms [insert: 2.6 ms, range1: 247.8 ms, simple1: 247.3 ms]
 Latency median : 39.0 ms [insert: 2.0 ms, range1: 229.9 ms, simple1: 229.0 ms]
 Latency 95th percentile : 480.8 ms [insert: 3.5 ms, range1: 567.8 ms, simple1: 
568.3 ms]
 Latency 99th percentile : 682.6 ms [insert: 22.1 ms, range1: 752.4 ms, 
simple1: 753.4 ms]
 Latency 99.9th percentile : 907.0 ms [insert: 51.3 ms, range1: 966.8 ms, 
simple1: 966.8 ms]
 Latency max : 1493.2 ms [insert: 695.2 ms, range1: 1,141.9 ms, simple1: 
1,493.2 ms]
 Total partitions : 2,698,021 [insert: 1,310,787, range1: 78,296, simple1: 
1,308,938]
 Total errors : 0 [insert: 0, range1: 0, simple1: 0]
 Total GC count : 0
 Total GC memory : 0.000 KiB
 Total GC time : 0.0 seconds
 Avg GC time : NaN ms
 StdDev GC time : 0.0 ms
 Total operation time : 02:00:01|
|SSTable count|1342
 1262
 1345
 1288
 1337|746
 753
 749
 742
 745|

I did wanted to test stress performance while running repair as well because we 
believe that RACS will bring a lot of benefit to repair, but so far 3 out of 3 
tries I haven't managed to get repair working, it looks like I've been running 
into this bug https://issues.apache.org/jira/browse/CASSANDRA-13938. 

Let me know if you would like any more benchmarks? I've got the terraform 
scripts to set up multi DC so it should be reasonably straightforward to test 
more (or if you would like a different form of test).

 


was (Author: lerh low):
Hi [~krummas], 

Sorry for the delay, here are some initial benchmarks. I've only tried it with 
LCS, this is the Stressspec YAML, a reasonably stressful test:

```
keyspace: stresscql2

keyspace_definition: |
 CREATE KEYSPACE stresscql2 WITH replication = \{'class': 
'NetworkTopologyStrategy', 'Waboku': 3, 'Bokusapp': 2};

table: typestest


table_definition: |
 CREATE TABLE typestest (
 name text,
 choice boolean,
 date timestamp,
 address inet,
 dbl double,
 lval bigint,
 ival int,
 uid timeuuid,
 value blob,
 PRIMARY KEY((name,choice), date, address, dbl, lval, ival, uid)
 ) WITH compaction = \{ 'class':'LeveledCompactionStrategy', 
'range_aware_compaction':'true', 'min_range_sstable_size_in_mb':'15' }
 AND comment='A table of many types to test wide rows'

columnspec:
 - name: name
 size: uniform(1..1000)
 population: uniform(1..500M) # the range of unique values to select for the 
field (default is 100Billion)
 - name: date
 cluster: uniform(20..1000)
 - name: lval
 population: gaussian(1..1000)
 cluster: uniform(1..4)
 - name: value
 size: uniform(100..500)

insert:
 partitions: fixed(1) # number of unique partitions to update in a single 
operation
 batchtype: UNLOGGED # type of batch to use
 select: uniform(1..10)/10 # uniform chance any single generated CQL row will 
be visited in a partition;
 
queries:
 simple1:
 cql: select * from typestest where name = ? and choice = ? LIMIT 1
 fields: samerow 
 range1:
 cql: select name, choice, uid from typestest where name = ? and choice = ? and 
date >= ? LIMIT 10
 fields: multirow 
 simple2:
 cql: select name, choice, uid from typestest where name = ? and choice = ? 
LIMIT 1
 fields: samerow # samerow or multirow (select arguments from the same row, or 
randomly from all rows in the partition)
```

This is done over a multi DC cluster in EC2, 400GB SSD with 3 nodes in 1 and 2 
nodes in the other. Stress replicates to both DCs. 

For inserts:
```
nohup cassandra-stress user no-warmup profile=stressspec.yaml n=150000000 
cl=QUORUM ops\(insert=1\) -node file=nodelist.txt -rate threads=100 -log 
file=insert.log > nohup.txt &
```

We have


|| ||RACS||NonRACS||
|Stress result|Op rate : 8,784 op/s [insert: 8,784 op/s]
Partition rate : 8,784 pk/s [insert: 8,784 pk/s]
Row rate : 8,784 row/s [insert: 8,784 row/s]
Latency mean : 5.4 ms [insert: 5.4 ms]
Latency median : 4.3 ms [insert: 4.3 ms]
Latency 95th percentile : 8.4 ms [insert: 8.4 ms]
Latency 99th percentile : 39.2 ms [insert: 39.2 ms]
Latency 99.9th percentile : 63.3 ms [insert: 63.3 ms]
Latency max : 1506.8 ms [insert: 1,506.8 ms]
Total partitions : 150,000,000 [insert: 150,000,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 04:44:35|Op rate : 8,730 op/s [insert: 8,730 op/s]
Partition rate : 8,730 pk/s [insert: 8,730 pk/s]
Row rate : 8,730 row/s [insert: 8,730 row/s]
Latency mean : 5.4 ms [insert: 5.4 ms]
Latency median : 4.3 ms [insert: 4.3 ms]
Latency 95th percentile : 8.5 ms [insert: 8.5 ms]
Latency 99th percentile : 39.4 ms [insert: 39.4 ms]
Latency 99.9th percentile : 66.1 ms [insert: 66.1 ms]
Latency max : 944.8 ms [insert: 944.8 ms]
Total partitions : 150,000,000 [insert: 150,000,000]
Total errors : 0 [insert: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 04:46:22|
|SSTable count|1339
1259
1342
1285
1333|743
750
747
737
741|


For mixed workloads, which is done after the insert so reads are not just read 
off the OS page cache:



 
|| ||RACS||Non RACS||
|Stress result |Op rate : 415 op/s [insert: 197 op/s, range1: 20 op/s, simple1: 
198 op/s]
Partition rate : 407 pk/s [insert: 197 pk/s, range1: 12 pk/s, simple1: 198 pk/s]
Row rate : 412 row/s [insert: 197 row/s, range1: 17 row/s, simple1: 198 row/s]
Latency mean : 120.4 ms [insert: 2.3 ms, range1: 227.0 ms, simple1: 227.3 ms]
Latency median : 38.0 ms [insert: 2.0 ms, range1: 207.0 ms, simple1: 207.4 ms]
Latency 95th percentile : 454.6 ms [insert: 3.1 ms, range1: 541.1 ms, simple1: 
543.2 ms]
Latency 99th percentile : 673.2 ms [insert: 5.1 ms, range1: 739.2 ms, simple1: 
741.3 ms]
Latency 99.9th percentile : 918.0 ms [insert: 43.4 ms, range1: 985.1 ms, 
simple1: 975.2 ms]
Latency max : 1584.4 ms [insert: 766.0 ms, range1: 1,426.1 ms, simple1: 1,584.4 
ms]
Total partitions : 2,930,512 [insert: 1,419,222, range1: 86,021, simple1: 
1,425,269]
Total errors : 0 [insert: 0, range1: 0, simple1: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 02:00:01|Op rate : 382 op/s [insert: 182 op/s, range1: 
18 op/s, simple1: 182 op/s]
Partition rate : 375 pk/s [insert: 182 pk/s, range1: 11 pk/s, simple1: 182 pk/s]
Row rate : 379 row/s [insert: 182 row/s, range1: 15 row/s, simple1: 182 row/s]
Latency mean : 130.8 ms [insert: 2.6 ms, range1: 247.8 ms, simple1: 247.3 ms]
Latency median : 39.0 ms [insert: 2.0 ms, range1: 229.9 ms, simple1: 229.0 ms]
Latency 95th percentile : 480.8 ms [insert: 3.5 ms, range1: 567.8 ms, simple1: 
568.3 ms]
Latency 99th percentile : 682.6 ms [insert: 22.1 ms, range1: 752.4 ms, simple1: 
753.4 ms]
Latency 99.9th percentile : 907.0 ms [insert: 51.3 ms, range1: 966.8 ms, 
simple1: 966.8 ms]
Latency max : 1493.2 ms [insert: 695.2 ms, range1: 1,141.9 ms, simple1: 1,493.2 
ms]
Total partitions : 2,698,021 [insert: 1,310,787, range1: 78,296, simple1: 
1,308,938]
Total errors : 0 [insert: 0, range1: 0, simple1: 0]
Total GC count : 0
Total GC memory : 0.000 KiB
Total GC time : 0.0 seconds
Avg GC time : NaN ms
StdDev GC time : 0.0 ms
Total operation time : 02:00:01|
|SSTable count|1342
1262
1345
1288
1337|746
753
749
742
745|

I did wanted to test stress performance while running repair as well because we 
believe that RACS will bring a lot of benefit to repair, but so far 3 out of 3 
tries I haven't managed to get repair working, it looks like I've been running 
into this bug https://issues.apache.org/jira/browse/CASSANDRA-13938. 

Let me know if you would like any more benchmarks? I've got the terraform 
scripts to set up multi DC so it should be reasonably straightforward to test 
more (or if you would like a different form of test).

 

> RangeAwareCompaction
> --------------------
>
>                 Key: CASSANDRA-10540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10540
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>            Priority: Major
>              Labels: compaction, lcs, vnodes
>             Fix For: 4.x
>
>
> Broken out from CASSANDRA-6696, we should split sstables based on ranges 
> during compaction.
> Requirements;
> * dont create tiny sstables - keep them bunched together until a single vnode 
> is big enough (configurable how big that is)
> * make it possible to run existing compaction strategies on the per-range 
> sstables
> We should probably add a global compaction strategy parameter that states 
> whether this should be enabled or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-10540) RangeAwareCompaction

Reply via email to