Wei Deng created CASSANDRA-12591:
------------------------------------
Summary: Re-evaluate the default 160MB sstable_size_in_mb choice
in LCS
Key: CASSANDRA-12591
URL: https://issues.apache.org/jira/browse/CASSANDRA-12591
Project: Cassandra
Issue Type: Improvement
Components: Compaction
Reporter: Wei Deng
There has been some effort from CASSANDRA-5727 in benchmarking and evaluating
the best max_sstable_size used by LeveledCompactionStrategy, and the conclusion
derived from that effort was to use 160MB as the most optimal size for both
throughput (i.e. the time spent on compaction, the smaller the better) and the
amount of bytes compacted (to avoid write amplification, the less the better).
However, when I read more into that test report, I realized it was conducted on
a hardware with the following configuration: "a single rackspace node with 2GB
of ram." I'm not sure if this was an ok hardware configuration for production
Cassandra deployment at that time (mid-2013), but it is definitely far lower
from today's hardware standard now.
Given that we now have compaction-stress which is able to generate SSTables
based on user defined stress profile with user defined table schema and
compaction parameters (compatible to cassandra-stress), it would be a useful
effort to relook at this number using a more realistic hardware configuration
and see if 160MB is still the optimal choice. It might also impact our
perceived "practical" node density with LCS nodes if it turns out bigger
max_sstable_size actually works better as it will allow less number of SSTables
(and hence less level and less write amplification) per node with bigger
density.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)