Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names
Another option instead of raw sstables is to use the Spark Migrator [1]. It reads a source cluster, can make some transformations (like table/column naming) and writes to a target cluster. It's a very convenient tool, OSS and free of charge. [1] https://github.com/scylladb/scylla-migrator On Fri, Jan 17, 2020 at 5:31 PM Erick Ramirez wrote: >> >> In terms of speed, the sstableloader should be faster correct? >> Maybe the DSE BulkLoader finds application when you want a slice of the data >> and not the entire cake. Is it correct? > > > There's no real direct comparison because DSBulk is designed for operating on > data in CSV or JSON as a replacement for the COPY command. Cheers! > > On Sat, Jan 18, 2020 at 6:29 AM Sergio wrote: >> >> Hi everyone, >> >> Is the DSE BulkLoader faster than the sstableloader? >> >> Sometimes I need to make a cluster snapshot and replicate a Cluster A to a >> Cluster B with fewer performance capabilities but the same data size. >> >> In terms of speed, the sstableloader should be faster correct? >> >> Maybe the DSE BulkLoader finds application when you want a slice of the data >> and not the entire cake. Is it correct? >> >> Thanks, >> >> Sergio >> >> - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names
> > > *In terms of speed, the sstableloader should be faster correct?Maybe the > DSE BulkLoader finds application when you want a slice of the data and not > the entire cake. Is it correct?* There's no real direct comparison because DSBulk is designed for operating on data in CSV or JSON as a replacement for the COPY command. Cheers! On Sat, Jan 18, 2020 at 6:29 AM Sergio wrote: > Hi everyone, > > Is the DSE BulkLoader faster than the sstableloader? > > Sometimes I need to make a cluster snapshot and replicate a Cluster A to a > Cluster B with fewer performance capabilities but the same data size. > > In terms of speed, the sstableloader should be faster correct? > > Maybe the DSE BulkLoader finds application when you want a slice of the > data and not the entire cake. Is it correct? > > Thanks, > > Sergio > > >
Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names
Hi everyone, Is the DSE BulkLoader faster than the sstableloader? Sometimes I need to make a cluster snapshot and replicate a Cluster A to a Cluster B with fewer performance capabilities but the same data size. In terms of speed, the sstableloader should be faster correct? Maybe the DSE BulkLoader finds application when you want a slice of the data and not the entire cake. Is it correct? Thanks, Sergio
RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names
Not sure what you mean by “online” migration. You can load data into the same name table in cluster B. If the primary keys match, data will be overwritten (effectively, not actually on disk). I think you can pipe the output of a dsbulk unload to a dsbulk load and make the data transfer very quick. Your clusters are very small, so this probably wouldn’t take long. How you get the client apps to connect to the correct cluster/stop running/etc. is beyond the scope of Cassandra. Sean Durity – Staff Systems Engineer, Cassandra From: Ankit Gadhiya Sent: Friday, January 17, 2020 1:05 PM To: user@cassandra.apache.org Subject: Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names Hi Sean, You got all valid points. Please see my answers below - 1. Reason we want to move from 'A' to 'B' is to get rid of 'A' Azure region completely. 2. Cluster names in 'A' and 'B' are different. 3. DSbulk - Is there anyway I can do online migration? - I still need to get clarity on whether data for same keyspace/table names can be merged between A and B. So 2 cases - 1. If merge is not an issue - I guess DSBulk or SSTableloader would be an option? 2. If merge is an issue - I am guessing without app code change - this wont be possible ,right? Thanks & Regards, Ankit Gadhiya On Fri, Jan 17, 2020 at 9:40 AM Durity, Sean R mailto:sean_r_dur...@homedepot.com>> wrote: A couple things to consider: * A separation of apps into their own clusters is typically a better model to avoid later entanglements * Dsbulk (1.4.1) is now available for only open source clusters. It is a great tool for unloading/loading * What data problem are you trying to solve with Cassandra and this move to another cluster? If it is high-availability, then trying to get to 2 DCs would be important. However, I think you will need at least a new keyspace if you can’t combine the data from the clusters. Whether this requires a code or config change depends on how configurable the developers made the connection and query details. (As a side rant: why is it that developers will write all kinds of new code, but don’t want to touch existing code?) * Your migration requirements are quite stringent (“we don’t want to change anything, lose anything, or stop anything. Make it happen!”). There may be a solution, but you may end up with something even more fragile afterwards. I would push back to see what is negotiable. Sean Durity – Staff Systems Engineer, Cassandra From: Ankit Gadhiya mailto:ankitgadh...@gmail.com>> Sent: Friday, January 17, 2020 8:50 AM To: user@cassandra.apache.org<mailto:user@cassandra.apache.org> Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names Hi Upasana, Thanks for your response. I’d love to do that as a first strategy but since they are both separate clusters , how would I do that? Keyspaces already have networktopologystrategy with RF=3. — Ankit On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com<mailto:028upasana...@gmail.com>> wrote: Hi, Did you consider adding Cassandra nodes from cluster B, into cluster A as a different data center ? Your keyspace would than be on Network topology data strategy. In this case, all data can be synced between both data centers by Cassandra using rebalancing. At client/application level you will have to ensure local quorum/ local consistency so that there is no impact on latencies. Once you have moved data applications to new cluster , you can then remove the old data center (cluster A), and cluster B would have fresh data. On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya mailto:ankitgadh...@gmail.com>> wrote: Thanks but there’s no DSE License. Wondering how sstableloader will help as some oh the Keyspace and tables names are same. Also how do i sync few system keyspaces. Thanks & Regards, Ankit On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov mailto:vvs...@gmail.com>> wrote: Loader* https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader [datastax.com]<https://urldefense.com/v3/__https:/www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader__;!!M-nmYVHPHQ!ZYeKjPXZF1wl9Nz0tJN8gy3m46Qf4nw7EmJX_Wd5ecuSBeP0V8GyjQhTiQh8hnDvcRk_RUg$> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov mailto:vvs...@gmail.com>> wrote: DataStax bulk loaded can be an option if data is large. On Fri, Jan 17, 2020, 07:33 Nitan Kainth mailto:nitankai...@gmail.com>> wrote: If the keyspace already exist, use copy command or sstableloader to merge data. If data volume it too big, consider spark or a custom java program Regards, Nitan Cell: 510 449 9629 On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya mailto:ankitgadh...@gmail.com>> wrote: Any leads on this ? — Ankit On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya mailto:ankitgadh...@g
Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names
Hi Sean, You got all valid points. Please see my answers below - 1. Reason we want to move from 'A' to 'B' is to get rid of 'A' Azure region completely. 2. Cluster names in 'A' and 'B' are different. 3. DSbulk - Is there anyway I can do online migration? - I still need to get clarity on whether data for same keyspace/table names can be merged between A and B. So 2 cases - 1. If merge is not an issue - I guess DSBulk or SSTableloader would be an option? 2. If merge is an issue - I am guessing without app code change - this wont be possible ,right? *Thanks & Regards,* *Ankit Gadhiya* On Fri, Jan 17, 2020 at 9:40 AM Durity, Sean R wrote: > A couple things to consider: > >- A separation of apps into their own clusters is typically a better >model to avoid later entanglements >- Dsbulk (1.4.1) is now available for only open source clusters. It is >a great tool for unloading/loading >- What data problem are you trying to solve with Cassandra and this >move to another cluster? If it is high-availability, then trying to get to >2 DCs would be important. However, I think you will need at least a new >keyspace if you can’t combine the data from the clusters. Whether this >requires a code or config change depends on how configurable the developers >made the connection and query details. (As a side rant: why is it that >developers will write all kinds of new code, but don’t want to touch >existing code?) >- Your migration requirements are quite stringent (“we don’t want to >change anything, lose anything, or stop anything. Make it happen!”). There >may be a solution, but you may end up with something even more fragile >afterwards. I would push back to see what is negotiable. > > > > > > > > Sean Durity – Staff Systems Engineer, Cassandra > > > > *From:* Ankit Gadhiya > *Sent:* Friday, January 17, 2020 8:50 AM > *To:* user@cassandra.apache.org > *Subject:* [EXTERNAL] Re: *URGENT* Migration across different Cassandra > cluster few having same keyspace/table names > > > > Hi Upasana, > > > > Thanks for your response. I’d love to do that as a first strategy but > since they are both separate clusters , how would I do that? Keyspaces > already have networktopologystrategy with RF=3. > > > > > > — Ankit > > > > On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com> > wrote: > > Hi, > > > > Did you consider adding Cassandra nodes from cluster B, into cluster A as > a different data center ? > > > > Your keyspace would than be on Network topology data strategy. > > > > In this case, all data can be synced between both data centers by > Cassandra using rebalancing. > > > > > > At client/application level you will have to ensure local quorum/ local > consistency so that there is no impact on latencies. > > > > Once you have moved data applications to new cluster , you can then remove > the old data center (cluster A), and cluster B would have fresh data. > > > > > > > > > > On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya > wrote: > > Thanks but there’s no DSE License. > > Wondering how sstableloader will help as some oh the Keyspace and tables > names are same. Also how do i sync few system keyspaces. > > > > > > Thanks & Regards, > > Ankit > > > > On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov wrote: > > Loader* > > > > https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader > [datastax.com] > <https://urldefense.com/v3/__https:/www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader__;!!M-nmYVHPHQ!ZYeKjPXZF1wl9Nz0tJN8gy3m46Qf4nw7EmJX_Wd5ecuSBeP0V8GyjQhTiQh8hnDvcRk_RUg$> > > > > On Fri, Jan 17, 2020, 09:09 Vova Shelgunov wrote: > > DataStax bulk loaded can be an option if data is large. > > > > On Fri, Jan 17, 2020, 07:33 Nitan Kainth wrote: > > If the keyspace already exist, use copy command or sstableloader to merge > data. If data volume it too big, consider spark or a custom java program > > > > Regards, > > Nitan > > Cell: 510 449 9629 > > > > On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya > wrote: > > > > Any leads on this ? > > > > — Ankit > > > > On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya > wrote: > > Hi Arvinder, > > > > Thanks for your response. > > > > Yes - Cluster B already has some data. Tables/KS names are identical ; for > data - I still haven't got the clarity if it has identical data or no - I > am assuming no since it's for different customers but need the con
Re: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names
The migration requirements are impossible given the current state of the database You probably can’t join two distinct clusters without app changes and without downtime unless you’re very lucky (same cluster name, app using quorum but not local quorum, both clusters using NetworkTopologyStrategy, neither app using serial reads or writes), and trying to do it with conflicting keyspace and table names makes it impossible Would just assume this isn’t possible and look for alternate plans, like downtime or code changes. > On Jan 17, 2020, at 6:40 AM, Durity, Sean R > wrote: > > > A couple things to consider: > A separation of apps into their own clusters is typically a better model to > avoid later entanglements > Dsbulk (1.4.1) is now available for only open source clusters. It is a great > tool for unloading/loading > What data problem are you trying to solve with Cassandra and this move to > another cluster? If it is high-availability, then trying to get to 2 DCs > would be important. However, I think you will need at least a new keyspace if > you can’t combine the data from the clusters. Whether this requires a code or > config change depends on how configurable the developers made the connection > and query details. (As a side rant: why is it that developers will write all > kinds of new code, but don’t want to touch existing code?) > Your migration requirements are quite stringent (“we don’t want to change > anything, lose anything, or stop anything. Make it happen!”). There may be a > solution, but you may end up with something even more fragile afterwards. I > would push back to see what is negotiable. > > > > Sean Durity – Staff Systems Engineer, Cassandra > > From: Ankit Gadhiya > Sent: Friday, January 17, 2020 8:50 AM > To: user@cassandra.apache.org > Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster > few having same keyspace/table names > > Hi Upasana, > > Thanks for your response. I’d love to do that as a first strategy but since > they are both separate clusters , how would I do that? Keyspaces already have > networktopologystrategy with RF=3. > > > — Ankit > > On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com> > wrote: > Hi, > > Did you consider adding Cassandra nodes from cluster B, into cluster A as a > different data center ? > > Your keyspace would than be on Network topology data strategy. > > In this case, all data can be synced between both data centers by Cassandra > using rebalancing. > > > At client/application level you will have to ensure local quorum/ local > consistency so that there is no impact on latencies. > > Once you have moved data applications to new cluster , you can then remove > the old data center (cluster A), and cluster B would have fresh data. > > > > > On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya wrote: > Thanks but there’s no DSE License. > Wondering how sstableloader will help as some oh the Keyspace and tables > names are same. Also how do i sync few system keyspaces. > > > Thanks & Regards, > Ankit > > On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov wrote: > Loader* > > https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader > [datastax.com] > > On Fri, Jan 17, 2020, 09:09 Vova Shelgunov wrote: > DataStax bulk loaded can be an option if data is large. > > On Fri, Jan 17, 2020, 07:33 Nitan Kainth wrote: > If the keyspace already exist, use copy command or sstableloader to merge > data. If data volume it too big, consider spark or a custom java program > > > Regards, > Nitan > Cell: 510 449 9629 > > > On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya wrote: > > > Any leads on this ? > > — Ankit > > On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya wrote: > Hi Arvinder, > > Thanks for your response. > > Yes - Cluster B already has some data. Tables/KS names are identical ; for > data - I still haven't got the clarity if it has identical data or no - I am > assuming no since it's for different customers but need the confirmation. > > Thanks & Regards, > Ankit Gadhiya > > > > On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon > wrote: > So as I understand, Cluster B already has some data and not an empty cluster. > > When you say, clusters share same keyspace and table names, do you mean both > clusters have identical data on those ks/tables? > > > -Arvi > > On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya wrote: > Hello Group, > > I have a requirement in one of the production systems where I need t
RE: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names
A couple things to consider: * A separation of apps into their own clusters is typically a better model to avoid later entanglements * Dsbulk (1.4.1) is now available for only open source clusters. It is a great tool for unloading/loading * What data problem are you trying to solve with Cassandra and this move to another cluster? If it is high-availability, then trying to get to 2 DCs would be important. However, I think you will need at least a new keyspace if you can’t combine the data from the clusters. Whether this requires a code or config change depends on how configurable the developers made the connection and query details. (As a side rant: why is it that developers will write all kinds of new code, but don’t want to touch existing code?) * Your migration requirements are quite stringent (“we don’t want to change anything, lose anything, or stop anything. Make it happen!”). There may be a solution, but you may end up with something even more fragile afterwards. I would push back to see what is negotiable. Sean Durity – Staff Systems Engineer, Cassandra From: Ankit Gadhiya Sent: Friday, January 17, 2020 8:50 AM To: user@cassandra.apache.org Subject: [EXTERNAL] Re: *URGENT* Migration across different Cassandra cluster few having same keyspace/table names Hi Upasana, Thanks for your response. I’d love to do that as a first strategy but since they are both separate clusters , how would I do that? Keyspaces already have networktopologystrategy with RF=3. — Ankit On Fri, Jan 17, 2020 at 8:45 AM Upasana Sharma <028upasana...@gmail.com<mailto:028upasana...@gmail.com>> wrote: Hi, Did you consider adding Cassandra nodes from cluster B, into cluster A as a different data center ? Your keyspace would than be on Network topology data strategy. In this case, all data can be synced between both data centers by Cassandra using rebalancing. At client/application level you will have to ensure local quorum/ local consistency so that there is no impact on latencies. Once you have moved data applications to new cluster , you can then remove the old data center (cluster A), and cluster B would have fresh data. On Fri, Jan 17, 2020, 6:59 PM Ankit Gadhiya mailto:ankitgadh...@gmail.com>> wrote: Thanks but there’s no DSE License. Wondering how sstableloader will help as some oh the Keyspace and tables names are same. Also how do i sync few system keyspaces. Thanks & Regards, Ankit On Fri, Jan 17, 2020 at 1:11 AM Vova Shelgunov mailto:vvs...@gmail.com>> wrote: Loader* https://www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader [datastax.com]<https://urldefense.com/v3/__https:/www.datastax.com/blog/2018/05/introducing-datastax-bulk-loader__;!!M-nmYVHPHQ!ZYeKjPXZF1wl9Nz0tJN8gy3m46Qf4nw7EmJX_Wd5ecuSBeP0V8GyjQhTiQh8hnDvcRk_RUg$> On Fri, Jan 17, 2020, 09:09 Vova Shelgunov mailto:vvs...@gmail.com>> wrote: DataStax bulk loaded can be an option if data is large. On Fri, Jan 17, 2020, 07:33 Nitan Kainth mailto:nitankai...@gmail.com>> wrote: If the keyspace already exist, use copy command or sstableloader to merge data. If data volume it too big, consider spark or a custom java program Regards, Nitan Cell: 510 449 9629 On Jan 16, 2020, at 10:26 PM, Ankit Gadhiya mailto:ankitgadh...@gmail.com>> wrote: Any leads on this ? — Ankit On Thu, Jan 16, 2020 at 8:51 PM Ankit Gadhiya mailto:ankitgadh...@gmail.com>> wrote: Hi Arvinder, Thanks for your response. Yes - Cluster B already has some data. Tables/KS names are identical ; for data - I still haven't got the clarity if it has identical data or no - I am assuming no since it's for different customers but need the confirmation. Thanks & Regards, Ankit Gadhiya On Thu, Jan 16, 2020 at 8:49 PM Arvinder Dhillon mailto:dhillona...@gmail.com>> wrote: So as I understand, Cluster B already has some data and not an empty cluster. When you say, clusters share same keyspace and table names, do you mean both clusters have identical data on those ks/tables? -Arvi On Thu, Jan 16, 2020, 5:27 PM Ankit Gadhiya mailto:ankitgadh...@gmail.com>> wrote: Hello Group, I have a requirement in one of the production systems where I need to be able to migrate entire dataset from Cluster A (Azure Region A) to Cluster B (Azure Region B). Each cluster have 3 Cassandra nodes (RF=3) running used by different applications. Few of the applications are common is Cluster A and Cluster B thereby sharing same keyspace/table names. Need suggestion for the best possible migration strategy here considering - 1. No Application code changes possible - Minor config/infra changes can be considered. 2. Zero data loss. 3. No/Minimal downtime. It'd be great to hear ideas from all of you based on your experiences. Cassandra Version - Cassandra 3.0.13 on both sides. Total Data size - Cluster A: 70 GB, Cluster B: 15 GB Thanks & Regards, Ankit