[ 
https://issues.apache.org/jira/browse/CASSANDRA-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950481#comment-14950481
 ] 

Jeremiah Jordan edited comment on CASSANDRA-10495 at 10/9/15 2:38 PM:
----------------------------------------------------------------------

I don't think that method is going to help for the strategies that take the 
biggest hit when this happens, LCS and DTCS.  For DTCS you would completely 
lose the splitting of partitions across time ranges, and for LCS how do you 
pick what level to put the data in?  For a given level data is already 
"compacted" so limiting by level wouldn't help.  And if you don't pick a level, 
you lose the benefits of the "streaming keeps sstable level" optimizations that 
were added.

An idea I had about this was to allow streaming to happen by sstable not by 
token range.  So for a given sstable you only stream it once, but you skip 
token ranges in the file that aren't owned by the receiver.  So you end up with 
at most the same number of files as the starting node had, and for LCS/DTCS 
those files coulee stay in the same buckets/levels they started in.


was (Author: jjordan):
I don't think that method is going to help for the strategies that take the 
biggest hit when this happens, LCS and DTCS.  For DTCS you would completely 
lose the splitting of partitions across time ranges, and for LCS how do you 
pick what level to put the data in?  For a given level data is already 
"compacted" so limiting by level wouldn't help.

An idea I had about this was to allow streaming to happen by sstable not by 
token range.  So for a given sstable you only stream it once, but you skip 
token ranges in the file that aren't owned by the receiver.  So you end up with 
at most the same number of files as the starting node had, and for LCS/DTCS 
those files coulee stay in the same buckets/levels they started in.

> Improve the way we do streaming with vnodes
> -------------------------------------------
>
>                 Key: CASSANDRA-10495
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10495
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Marcus Eriksson
>             Fix For: 3.x
>
>
> Streaming with vnodes usually creates a large amount of sstables on the 
> target node - for example if each source node has 100 sstables and we use 
> num_tokens = 256, the bootstrapping (for example) node might get 100*256 
> sstables
> One approach could be to do an on-the-fly compaction on the source node, 
> meaning we would only stream out one sstable per range. Note that we will 
> want the compaction strategy to decide how to combine the sstables, for 
> example LCS will not want to mix sstables from different levels while STCS 
> can probably just combine everything
> cc [~yukim]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to