[jira] [Commented] (CASSANDRA-10540) RangeAwareCompaction

Marcus Eriksson (JIRA) Wed, 17 Aug 2016 01:26:51 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15424092#comment-15424092
 ]


Marcus Eriksson commented on CASSANDRA-10540:
---------------------------------------------

These "benchmarks" have been run using cassandra-stress with 
[this|https://paste.fscking.us/display/jKc0X89MLFzHE9jhRqQ5xfvRHeU] yaml (only 
modified per run with the different compaction configurations). 
cassandra-stress generates 40GB of data and then it compacts those sstables 
using 8 threads. All tests were run with 256 tokens on my machine (2 ssds, 32GB 
ram):
{code}
./tools/bin/compaction-stress write -d /var/lib/cassandra -d 
/home/marcuse/cassandra -g 40 -p blogpost-range.yaml -t 4 -v 256
./tools/bin/compaction-stress compact -d /var/lib/cassandra -d 
/home/marcuse/cassandra -p blogpost-range.yaml -t 8 -v 256
{code}

First a base line - it takes about 7 minutes to compact 40GB of data with STCS, 
and we get a write amplification (compaction bytes written / size before) of 
about 1.46.
* 40GB + STCS
||size before||size after||compaction bytes written||time||number of 
compactions||
|42986704571|31305948786|62268272752|7:44|26|
|43017694284|31717603488|62800073327|7:04|26|
|42863193047|31244649872|64673778727|6:44|26|
|42962733336|31842455113|62985984309|6:14|26|
|43107421526|32526047125|61657717328|6:04|26|

With range aware compaction and a small min_range_sstable_size_in_mb we compact 
slower, about 2x the time, but the end result is smaller with a tiny bit smaller
write amplification (1.44). The reason for the longer time is that we need to 
do a lot more tiny compaction for each vnode. The reason for the smaller size 
after the compactions is that we are much more likely to compact overlapping 
sstables together as we compact within each vnode.
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 1
||size before||size after||compaction bytes written||time||number of 
compactions||
|42944940703|25352795435|61734295478|13:18|286|
|42896304174|25830662102|62049066195|15:45|287|
|43091495756|24811367911|61448601743|12:25|287|
|42961529234|26275106863|63118850488|13:17|284|
|42902111497|25749453764|61529524300|13:54|280|

As we increase the min_range_sstable_size_in_mb the time spent is reduced, the 
size after the compaction is increased and the number of compactions is reduced 
since we don't promote sstables to the per-vnode-strategies as quickly. With 
large enough min_range_sstable_size_in_mb the behaviour will be the same as 
STCS (+a small overhead for estimating the size of the next vnode range during 
compaction)
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 5
||size before||size after||compaction bytes written||time||number of 
compactions||
|43071111106|27586259306|62855258024|10:35|172|
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 10
||size before||size after||compaction bytes written||time||number of 
compactions||
|42998501805|28281735688|65469323764|9:45|109|
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 20
||size before||size after||compaction bytes written||time||number of 
compactions||
|42801860659|28934194973|66554340039|10:05|48|
* 40GB + STCS + range_aware + min_range_sstable_size_in_mb: 50
||size before||size after||compaction bytes written||time||number of 
compactions||
|42881416448|30352758950|61223610818|7:25|27|

With LCS and a small sstable_size_in_mb we get a huge difference with range 
aware due to the amount of compactions we need to do to get the leveling 
without range aware compaction. With range aware, we get fewer levels in each 
vnode-range and that is much quicker to compact. Write amplification is about 
2.0 with range aware and 3.4 without.
* 40GB + LCS + sstable_size_in_mb: 10 + range_aware + 
min_range_sstable_size_in_mb: 10
||size before||size after||compaction bytes written||time||number of 
compactions||
|43170254812|26511935628|87637370434|19:55|903|
|43015904097|26100197485|83125478305|14:45|854|
|43188886684|25651102691|87520409116|19:55|920|

* 40GB + LCS + sstable_size_in_mb: 10
||size before||size after||compaction bytes written||time||number of 
compactions||
|43099495889|23876144309|139000531662|28:25|3751|
|42811000078|24620085107|147722973544|30:35|3909|
|42879141849|24479485292|146194679395|30:46|3882|

If we bump the lcs sstable_size_in_mb to the default we get more similar 
results. Write amplification is smaller with range aware compaction but size 
after is also bigger. The reason for the bigger size after compaction has 
settled is that we run with a bigger min_range_sstable_size_in_mb which means 
more data will stay out of the per-range compaction strategies and this means 
it is only size tiered. This probably also explains the reduced write 
amplification - 2.0 with range aware and 2.3 without.
* 40GB + LCS + sstable_size_in_mb: 160 + range_aware + 
min_range_sstable_size_in_mb: 20
||size before||size after||compaction bytes written||time||number of 
compactions||
|42970784099|27044941599|85933586287|12:55|180|
|42953512565|26229232777|82158863291|11:36|155|
|43028281629|26025950993|86704157660|11:25|177|

* 40GB + LCS + sstable_size_in_mb: 160
||size before||size after||compaction bytes written||time||number of 
compactions||
|43120992697|24487560567|100347633105|12:25|151|
|42854926611|24466503628|102492898148|10:55|155|
|42919253642|24831918330|100902215961|12:15|161|


> RangeAwareCompaction
> --------------------
>
>                 Key: CASSANDRA-10540
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10540
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Marcus Eriksson
>            Assignee: Marcus Eriksson
>              Labels: compaction, lcs, vnodes
>             Fix For: 3.x
>
>
> Broken out from CASSANDRA-6696, we should split sstables based on ranges 
> during compaction.
> Requirements;
> * dont create tiny sstables - keep them bunched together until a single vnode 
> is big enough (configurable how big that is)
> * make it possible to run existing compaction strategies on the per-range 
> sstables
> We should probably add a global compaction strategy parameter that states 
> whether this should be enabled or not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10540) RangeAwareCompaction

Reply via email to