[ 
https://issues.apache.org/jira/browse/CASSANDRA-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Juho Mäkinen updated CASSANDRA-10757:
-------------------------------------
    Description: 
When sstableloader is used to migrate data from a cluster into another the 
loading creates a lot more data and a lot more sstable files than what the 
original cluster had.

For example in my case a 62 node with 16 TiB of data in 80000 sstables was 
sstableloaded into another cluster with 36 nodes and this resulted with 42 TiB 
of used data in a whopping 350000 sstables.

The sstableloadering process itself was relatively fast (around 8 hours), but 
in the result the destination cluster needs approximately two weeks of 
compaction to be able to reduce the number of sstables back to the original 
levels. (The instances are C4.4xlarge in EC2, 16 cores each, compaction running 
on 14 cores. the EBS disksin each instance provide 9000 iops and max 250 
MiB/sec disk bandwidth.).

Could sstableloader process somehow improved to make this kind of migrations 
less painful and faster?

  was:
When sstableloader is used to migrate data from a cluster into another the 
loading creates a lot more data and a lot more sstable files than what the 
original cluster had.

For example in my case a 62 node with 16 TiB of data in 80000 sstables was 
sstableloaded into another cluster with 36 nodes and this resulted with 42 TiB 
of used data in a whopping 350000 sstables.

The sstableloadering process itself was relatively fast (around 8 hours), but 
in the result the destination cluster needs approximately two weeks of 
compaction (these are C4.4xlarge instances, 16 cores each, compaction running 
on 14 cores, 9000 iops, 250 MiB/sec sustained disk bandwidth.)

Could sstableloader process somehow improved to make this kind of migrations 
less painful and faster?


> Cluster migration with sstableloader requires significant compaction time
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10757
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10757
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Compaction, Streaming and Messaging
>            Reporter: Juho Mäkinen
>            Priority: Minor
>              Labels: sstableloader
>             Fix For: 2.1.11
>
>
> When sstableloader is used to migrate data from a cluster into another the 
> loading creates a lot more data and a lot more sstable files than what the 
> original cluster had.
> For example in my case a 62 node with 16 TiB of data in 80000 sstables was 
> sstableloaded into another cluster with 36 nodes and this resulted with 42 
> TiB of used data in a whopping 350000 sstables.
> The sstableloadering process itself was relatively fast (around 8 hours), but 
> in the result the destination cluster needs approximately two weeks of 
> compaction to be able to reduce the number of sstables back to the original 
> levels. (The instances are C4.4xlarge in EC2, 16 cores each, compaction 
> running on 14 cores. the EBS disksin each instance provide 9000 iops and max 
> 250 MiB/sec disk bandwidth.).
> Could sstableloader process somehow improved to make this kind of migrations 
> less painful and faster?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to