[
https://issues.apache.org/jira/browse/CASSANDRA-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15032986#comment-15032986
]
Nick Bailey commented on CASSANDRA-10757:
-----------------------------------------
You are seeing the effects of CASSANDRA-4756
> Cluster migration with sstableloader requires significant compaction time
> -------------------------------------------------------------------------
>
> Key: CASSANDRA-10757
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10757
> Project: Cassandra
> Issue Type: Improvement
> Components: Compaction, Streaming and Messaging
> Reporter: Juho Mäkinen
> Priority: Minor
> Labels: sstableloader
>
> When sstableloader is used to migrate data from a cluster into another the
> loading creates a lot more data and a lot more sstable files than what the
> original cluster had.
> For example in my case a 62 node with 16 TiB of data in 80000 sstables was
> sstableloaded into another cluster with 36 nodes and this resulted with 42
> TiB of used data in a whopping 350000 sstables.
> The sstableloadering process itself was relatively fast (around 8 hours), but
> in the result the destination cluster needs approximately two weeks of
> compaction to be able to reduce the number of sstables back to the original
> levels. (The instances are C4.4xlarge in EC2, 16 cores each, compaction
> running on 14 cores. the EBS disks in each instance provide 9000 iops and max
> 250 MiB/sec disk bandwidth.).
> Could sstableloader process somehow improved to make this kind of migrations
> less painful and faster?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)