[ 
https://issues.apache.org/jira/browse/CASSANDRA-4784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480157#comment-13480157
 ] 

sankalp kohli commented on CASSANDRA-4784:
------------------------------------------

The difference will be marginal and can be only the data in memtable that has 
not been flushed. We can copy all the sstables which are present in the replica 
and keep a View of sstables copied. If there is any addition/deletion of 
sstables in the mean while, we can so another sync. 
So the diff will only be the content in memtable. So we can run a repair like 
we do today after a bootstrap. 
The main advantage will be speed of recovery for a node specially with lots of 
data. Currently it is bound by application. Also the node serving the data will 
not have to do any work in the application.
Another small benefit is that you will not create objects in JVMs while 
transferring data. 
                
> Create separate sstables for each token range handled by a node
> ---------------------------------------------------------------
>
>                 Key: CASSANDRA-4784
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4784
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: sankalp kohli
>            Priority: Minor
>              Labels: perfomance
>
> Currently, each sstable has data for all the ranges that node is handling. If 
> we change that and rather have separate sstables for each range that node is 
> handling, it can lead to some improvements.
> Improvements
> 1) Node rebuild will be very fast as sstables can be directly copied over to 
> the bootstrapping node. It will minimize any application level logic. We can 
> directly use Linux native methods to transfer sstables without using CPU and 
> putting less pressure on the serving node. I think in theory it will be the 
> fastest way to transfer data. 
> 2) Backup can only transfer sstables for a node which belong to its primary 
> keyrange. 
> 3) ETL process can only copy one replica of data and will be much faster. 
> Changes:
> We can split the writes into multiple memtables for each range it is 
> handling. The sstables being flushed from these can have details of which 
> range of data it is handling.
> There will be no change I think for any reads as they work with interleaved 
> data anyway. But may be we can improve there as well? 
> Complexities:
> The change does not look very complicated. I am not taking into account how 
> it will work when ranges are being changed for nodes. 
> Vnodes might make this work more complicated. We can also have a bit on each 
> sstable which says whether it is primary data or not. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to