[
https://issues.apache.org/jira/browse/CASSANDRA-12591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15458913#comment-15458913
]
Edward Capriolo commented on CASSANDRA-12591:
---------------------------------------------
I would argue that a spindle-based system is not the common case anymore and
that most deployments are SSD based. Also compaction time is different based on
write patterns. I think important factors include : Unique partitions, cells
per row, number of overwrites, % tombstones, % TTL data. I mention this because
I have seen benchmark data that is impressive but not always applicable to real
world data.
> Re-evaluate the default 160MB sstable_size_in_mb choice in LCS
> --------------------------------------------------------------
>
> Key: CASSANDRA-12591
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12591
> Project: Cassandra
> Issue Type: Improvement
> Components: Compaction
> Reporter: Wei Deng
> Labels: lcs
>
> There has been some effort from CASSANDRA-5727 in benchmarking and evaluating
> the best max_sstable_size used by LeveledCompactionStrategy, and the
> conclusion derived from that effort was to use 160MB as the most optimal size
> for both throughput (i.e. the time spent on compaction, the smaller the
> better) and the amount of bytes compacted (to avoid write amplification, the
> less the better).
> However, when I read more into that test report (the short
> [comment|https://issues.apache.org/jira/browse/CASSANDRA-5727?focusedCommentId=13722571&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13722571]
> describing the tests), I realized it was conducted on a hardware with the
> following configuration: "a single rackspace node with 2GB of ram." I'm not
> sure if this was an ok hardware configuration for production Cassandra
> deployment at that time (mid-2013), but it is definitely far lower from
> today's hardware standard now.
> Given that we now have compaction-stress which is able to generate SSTables
> based on user defined stress profile with user defined table schema and
> compaction parameters (compatible to cassandra-stress), it would be a useful
> effort to relook at this number using a more realistic hardware configuration
> and see if 160MB is still the optimal choice. It might also impact our
> perceived "practical" node density with LCS nodes if it turns out bigger
> max_sstable_size actually works better as it will allow less number of
> SSTables (and hence less level and less write amplification) per node with
> bigger density.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)