[ 
https://issues.apache.org/jira/browse/CASSANDRA-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486081#comment-13486081
 ] 

Benjamin Coverston commented on CASSANDRA-4784:
-----------------------------------------------

I have a working implementation of this for STCS, one issue is it has the 
unfortunate (or fortunate) side effect of also partitioning up the SSTables for 
LCS as I put the implementation inside the CompactionTask making the currently 
(small) SSTables much smaller.

I feel like this puts us at a crossroads: Should we create a completely 
partitioned data strategy for vnodes (a directory per vnode), or should we 
continue to mix the data files in a single data directory?

L0 to L1 compactions become particularly hairy if we do that unless we first 
partition the L0 SSTables then subsequently compact the partitioned L0 with L1 
for the vnode.

                
> Create separate sstables for each token range handled by a node
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-4784
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4784
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: sankalp kohli
>            Assignee: Benjamin Coverston
>            Priority: Minor
>              Labels: perfomance
>
> Currently, each sstable has data for all the ranges that node is handling. If 
> we change that and rather have separate sstables for each range that node is 
> handling, it can lead to some improvements.
> Improvements
> 1) Node rebuild will be very fast as sstables can be directly copied over to 
> the bootstrapping node. It will minimize any application level logic. We can 
> directly use Linux native methods to transfer sstables without using CPU and 
> putting less pressure on the serving node. I think in theory it will be the 
> fastest way to transfer data. 
> 2) Backup can only transfer sstables for a node which belong to its primary 
> keyrange. 
> 3) ETL process can only copy one replica of data and will be much faster. 
> Changes:
> We can split the writes into multiple memtables for each range it is 
> handling. The sstables being flushed from these can have details of which 
> range of data it is handling.
> There will be no change I think for any reads as they work with interleaved 
> data anyway. But may be we can improve there as well? 
> Complexities:
> The change does not look very complicated. I am not taking into account how 
> it will work when ranges are being changed for nodes. 
> Vnodes might make this work more complicated. We can also have a bit on each 
> sstable which says whether it is primary data or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to