[ 
https://issues.apache.org/jira/browse/CASSANDRA-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15457121#comment-15457121
 ] 

Wei Deng edited comment on CASSANDRA-12591 at 9/2/16 12:56 AM:
---------------------------------------------------------------

Just finished a 100GB compaction test on the same hardware, and it still shows 
that 1280MB sstable size indeed works better than 160MB.

I only had time to finish one run and here are the numbers:

1280MB sstable size: 127m2.955s
160MB sstable size: 162m46.877s

So 1280MB max_sstable_size is again 22% improvement on compaction throughput.

Next I'm going to run the same 100GB tests on a SSD-based environment (Amazon 
i2.xlarge) to see if the same advantage still remains.


was (Author: weideng):
Just finished a 100GB compaction test on the same hardware, and it still shows 
that 1280MB sstable size works much better than 160MB.

I only had time to finish one run and here are the numbers:

1280MB sstable size: 127m2.955s
160MB sstable size: 162m46.877s

So 1280MB max_sstable_size is again 22% improvement on compaction throughput.

Next I'm going to run the same 100GB tests on a SSD-based environment (Amazon 
i2.xlarge) to see if the same advantage still remains.

> Re-evaluate the default 160MB sstable_size_in_mb choice in LCS
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-12591
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12591
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction
>            Reporter: Wei Deng
>              Labels: lcs
>
> There has been some effort from CASSANDRA-5727 in benchmarking and evaluating 
> the best max_sstable_size used by LeveledCompactionStrategy, and the 
> conclusion derived from that effort was to use 160MB as the most optimal size 
> for both throughput (i.e. the time spent on compaction, the smaller the 
> better) and the amount of bytes compacted (to avoid write amplification, the 
> less the better).
> However, when I read more into that test report (the short 
> [comment|https://issues.apache.org/jira/browse/CASSANDRA-5727?focusedCommentId=13722571&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13722571]
>  describing the tests), I realized it was conducted on a hardware with the 
> following configuration: "a single rackspace node with 2GB of ram." I'm not 
> sure if this was an ok hardware configuration for production Cassandra 
> deployment at that time (mid-2013), but it is definitely far lower from 
> today's hardware standard now.
> Given that we now have compaction-stress which is able to generate SSTables 
> based on user defined stress profile with user defined table schema and 
> compaction parameters (compatible to cassandra-stress), it would be a useful 
> effort to relook at this number using a more realistic hardware configuration 
> and see if 160MB is still the optimal choice. It might also impact our 
> perceived "practical" node density with LCS nodes if it turns out bigger 
> max_sstable_size actually works better as it will allow less number of 
> SSTables (and hence less level and less write amplification) per node with 
> bigger density.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to