Hi all,

I have been spending the last few days trying to move my C* cluster on
Gcloud (3 nodes, 700GB) into a DC/OS deployment. This, as you people might
know, was not trivial.

I have finally found a way to do this migration in a time-efficient way (We
evaluated bulkloading and sstableloader, but these would take much too
long, especially if we want to repeat this process between different
deployments).

I would really appreciate if you can review my approach below and comment
on where I can do something better (or automate it using existing tools
that I might not have stumbled across).

All the data from my previous setup is on persistent disks. I created
copies of those persistent disks and attached them to DC/OS agents. When
deploying the service on DC/OS, I specified disk type as MOUNT and provided
the same cluster name as my previous setup.

After the service was successfully deployed, I logged into cqlsh. I was
able to see all the keyspaces but all the column families were missing.
When I rechecked my data directory on the persistent disk, I was able to
see all my data in different directories. Each directory has a hash
attached to its name.

For example,  if the table is *data_main_bim_dn_10*, its data directory is
named data_main_bim_dn_10-a73202c02bf311e8b5106b13f463f8b9. I created a new
table with the same name through cqlsh. This resulted in creation of
another directory with a different hash i.e.
data_main_bim_dn_10-c146e8d038c611e8b48cb7bc120612c9. I copied all data
from the former to the latter.

Then I ran *"nodetool refresh ks1  data_main_bim_dn_10"*. After that I was
able to access all data contents through cqlsh.

Now, the problem is, I have around 500 tables and the method I mentioned
above is quite cumbersome. Bulkloading through sstableloader or remote
seeding are also a couple of options but they will take a lot of time. Does
anyone know an easier way to shift all my data to new setup on DC/OS?

-- 
Faraz Mateen

Reply via email to