[ 
https://issues.apache.org/jira/browse/CASSANDRA-7666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325050#comment-15325050
 ] 

Tupshin Harper commented on CASSANDRA-7666:
-------------------------------------------

In addition to being relevant to CASSANDRA-11989, I believe range-segmented 
sstables represents an under-appreciated potential optimization for compaction 
strategies. As a rule of thumb, we tend to recommend that STCS workloads be 
kept under 2TB, or so. The main reason for this (besides operational concerns 
involving time to bootstrap/repair/etc), is that STCS compaction performance 
scales sublinearly with the amount of data in a table/node, and that the write 
amplification factor is substantially higher at 10TB than 2.  With 
range-segmented-sstables, just 5 segments would allow 10TB to be isolated into 
2 segment sections, and as long as the cumulative IO and CPU of the nodes was 
sufficient for the total workload, could sustain performance at that scale. 

I suggest that this ticket be re-opsened for those two reasons.

> Range-segmented sstables
> ------------------------
>
>                 Key: CASSANDRA-7666
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7666
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: CQL
>            Reporter: Jonathan Ellis
>              Labels: dense-storage
>
> It would be useful to segment sstables by data range (not just token range as 
> envisioned by CASSANDRA-6696).
> The primary use case is to allow deleting those data ranges for "free" by 
> dropping the sstables involved.  We should also (possibly as a separate 
> ticket) be able to leverage this information in query planning to avoid 
> unnecessary sstable reads.
> Relational databases typically call this "partitioning" the table, but 
> obviously we use that term already for something else: 
> http://www.postgresql.org/docs/9.1/static/ddl-partitioning.html
> Tokutek's take for mongodb: 
> http://docs.tokutek.com/tokumx/tokumx-partitioned-collections.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to