[ 
https://issues.apache.org/jira/browse/CASSANDRA-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575153#comment-13575153
 ] 

Jouni Hartikainen commented on CASSANDRA-4784:
----------------------------------------------

I'm not really sure if I understood this correctly, but wouldn't this change 
lead to memtable flushes creating much more random I/O than previously? 
Especially when using vnodes wouldn't the incoming data be spread to num_tokens 
files per CF instead of one per CF? Wouldn't this affect compactions as well? 
E.g. for default size tiered strategy, instead of compacting 4 larger SSTables 
into one even larger per CF, we would be compacting num_tokens * 4 smaller 
files into num_tokens larger ones per CF.

Am I missing something here?
                
> Create separate sstables for each token range handled by a node
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-4784
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4784
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.2.0 beta 1
>            Reporter: sankalp kohli
>            Assignee: Benjamin Coverston
>            Priority: Minor
>              Labels: perfomance
>             Fix For: 2.0
>
>         Attachments: 4784.patch
>
>
> Currently, each sstable has data for all the ranges that node is handling. If 
> we change that and rather have separate sstables for each range that node is 
> handling, it can lead to some improvements.
> Improvements
> 1) Node rebuild will be very fast as sstables can be directly copied over to 
> the bootstrapping node. It will minimize any application level logic. We can 
> directly use Linux native methods to transfer sstables without using CPU and 
> putting less pressure on the serving node. I think in theory it will be the 
> fastest way to transfer data. 
> 2) Backup can only transfer sstables for a node which belong to its primary 
> keyrange. 
> 3) ETL process can only copy one replica of data and will be much faster. 
> Changes:
> We can split the writes into multiple memtables for each range it is 
> handling. The sstables being flushed from these can have details of which 
> range of data it is handling.
> There will be no change I think for any reads as they work with interleaved 
> data anyway. But may be we can improve there as well? 
> Complexities:
> The change does not look very complicated. I am not taking into account how 
> it will work when ranges are being changed for nodes. 
> Vnodes might make this work more complicated. We can also have a bit on each 
> sstable which says whether it is primary data or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to