Hi all, I have been spending the last few days trying to move my C* cluster on Gcloud (3 nodes, 700GB) into a DC/OS deployment. This, as you people might know, was not trivial.
I have finally found a way to do this migration in a time-efficient way (We evaluated bulkloading and sstableloader, but these would take much too long, especially if we want to repeat this process between different deployments). I would really appreciate if you can review my approach below and comment on where I can do something better (or automate it using existing tools that I might not have stumbled across). All the data from my previous setup is on persistent disks. I created copies of those persistent disks and attached them to DC/OS agents. When deploying the service on DC/OS, I specified disk type as MOUNT and provided the same cluster name as my previous setup. After the service was successfully deployed, I logged into cqlsh. I was able to see all the keyspaces but all the column families were missing. When I rechecked my data directory on the persistent disk, I was able to see all my data in different directories. Each directory has a hash attached to its name. For example, if the table is *data_main_bim_dn_10*, its data directory is named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I created a new table with the same name through cqlsh. This resulted in creation of another directory with a different hash i.e. data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all data from the former to the latter. Then I ran *"nodetool refresh ks1 data_main_bim_dn_10"*. After that I was able to access all data contents through cqlsh. Now, the problem is, I have around 500 tables and the method I mentioned above is quite cumbersome. Bulkloading through sstableloader or remote seeding are also a couple of options but they will take a lot of time. Does anyone know an easier way to shift all my data to new setup on DC/OS? -- Faraz Mateen