[ https://issues.apache.org/jira/browse/CASSANDRA-10495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14950481#comment-14950481 ]
Jeremiah Jordan edited comment on CASSANDRA-10495 at 10/9/15 2:38 PM: ---------------------------------------------------------------------- I don't think that method is going to help for the strategies that take the biggest hit when this happens, LCS and DTCS. For DTCS you would completely lose the splitting of partitions across time ranges, and for LCS how do you pick what level to put the data in? For a given level data is already "compacted" so limiting by level wouldn't help. And if you don't pick a level, you lose the benefits of the "streaming keeps sstable level" optimizations that were added. An idea I had about this was to allow streaming to happen by sstable not by token range. So for a given sstable you only stream it once, but you skip token ranges in the file that aren't owned by the receiver. So you end up with at most the same number of files as the starting node had, and for LCS/DTCS those files coulee stay in the same buckets/levels they started in. was (Author: jjordan): I don't think that method is going to help for the strategies that take the biggest hit when this happens, LCS and DTCS. For DTCS you would completely lose the splitting of partitions across time ranges, and for LCS how do you pick what level to put the data in? For a given level data is already "compacted" so limiting by level wouldn't help. An idea I had about this was to allow streaming to happen by sstable not by token range. So for a given sstable you only stream it once, but you skip token ranges in the file that aren't owned by the receiver. So you end up with at most the same number of files as the starting node had, and for LCS/DTCS those files coulee stay in the same buckets/levels they started in. > Improve the way we do streaming with vnodes > ------------------------------------------- > > Key: CASSANDRA-10495 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10495 > Project: Cassandra > Issue Type: Improvement > Reporter: Marcus Eriksson > Fix For: 3.x > > > Streaming with vnodes usually creates a large amount of sstables on the > target node - for example if each source node has 100 sstables and we use > num_tokens = 256, the bootstrapping (for example) node might get 100*256 > sstables > One approach could be to do an on-the-fly compaction on the source node, > meaning we would only stream out one sstable per range. Note that we will > want the compaction strategy to decide how to combine the sstables, for > example LCS will not want to mix sstables from different levels while STCS > can probably just combine everything > cc [~yukim] -- This message was sent by Atlassian JIRA (v6.3.4#6332)