Re: Data synchronization between 2 running clusters on different availability zone
On Thu, Nov 27, 2014 at 1:24 AM, Spico Florin spicoflo...@gmail.com wrote: I have another question. What about the following scenario: two Cassandra instances installed on different cloud providers (EC2, Flexiant)? How do you synchronize them? Can you use some internal tools or do I have to implement my own mechanism? That's what I meant by if maybe hybrid in the future, use GPFS : http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html hybrid in this case means AWS-and-not-AWS. =Rob
Re: Data synchronization between 2 running clusters on different availability zone
Here's a snitch we use for this situation - it uses a property file if it exists, but falls back to EC2 autodiscovery if it is missing. https://github.com/barchart/cassandra-plugins/blob/master/src/main/java/com/barchart/cassandra/plugins/snitch/GossipingPropertyFileWithEC2FallbackSnitch.java On Mon, Dec 1, 2014 at 12:33 PM, Robert Coli rc...@eventbrite.com wrote: On Thu, Nov 27, 2014 at 1:24 AM, Spico Florin spicoflo...@gmail.com wrote: I have another question. What about the following scenario: two Cassandra instances installed on different cloud providers (EC2, Flexiant)? How do you synchronize them? Can you use some internal tools or do I have to implement my own mechanism? That's what I meant by if maybe hybrid in the future, use GPFS : http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html hybrid in this case means AWS-and-not-AWS. =Rob
Re: Data synchronization between 2 running clusters on different availability zone
Hello, Rob! Thank you very much for the detailed support. Regards, Florin On Wed, Nov 26, 2014 at 12:41 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin spicoflo...@gmail.com wrote: 1. For ensuring high availability I would like to install one Cassandra cluster on one availability zone (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2 US-west). One cluster, replication factor of 2, cluster configured with a rack aware snitch is how this is usually done. Well, more accurately, people usually deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is also required to use QUORUM Consistency Level. If you will always operate only out of EC2, you probably want to look into the EC2Snitch. If you plan to ultimately go multi-region, EC2MultiRegionSnitch. If maybe hybrid in the future, GossipingPropertyFileSnitch. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html For some good meta on the internals here : https://issues.apache.org/jira/browse/CASSANDRA-3810 =Rob http://twitter.com/rcolidba
Re: Data synchronization between 2 running clusters on different availability zone
Hello! I have another question. What about the following scenario: two Cassandra instances installed on different cloud providers (EC2, Flexiant)? How do you synchronize them? Can you use some internal tools or do I have to implement my own mechanism? Thanks. Florin On Thu, Nov 27, 2014 at 11:18 AM, Spico Florin spicoflo...@gmail.com wrote: Hello, Rob! Thank you very much for the detailed support. Regards, Florin On Wed, Nov 26, 2014 at 12:41 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin spicoflo...@gmail.com wrote: 1. For ensuring high availability I would like to install one Cassandra cluster on one availability zone (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2 US-west). One cluster, replication factor of 2, cluster configured with a rack aware snitch is how this is usually done. Well, more accurately, people usually deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is also required to use QUORUM Consistency Level. If you will always operate only out of EC2, you probably want to look into the EC2Snitch. If you plan to ultimately go multi-region, EC2MultiRegionSnitch. If maybe hybrid in the future, GossipingPropertyFileSnitch. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html For some good meta on the internals here : https://issues.apache.org/jira/browse/CASSANDRA-3810 =Rob http://twitter.com/rcolidba
Re: Data synchronization between 2 running clusters on different availability zone
There's no reason you can't run on multiple cloud providers as long as you treat them as logically distinct DC's. It should largely work the same way as running in several AWS regions, but you'll need to use something like GossipingPropertyFileSnitch because the EC2 snitches are specific to AWS. On Thu Nov 27 2014 at 2:26:27 AM Spico Florin spicoflo...@gmail.com wrote: Hello! I have another question. What about the following scenario: two Cassandra instances installed on different cloud providers (EC2, Flexiant)? How do you synchronize them? Can you use some internal tools or do I have to implement my own mechanism? Thanks. Florin On Thu, Nov 27, 2014 at 11:18 AM, Spico Florin spicoflo...@gmail.com wrote: Hello, Rob! Thank you very much for the detailed support. Regards, Florin On Wed, Nov 26, 2014 at 12:41 AM, Robert Coli rc...@eventbrite.com wrote: On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin spicoflo...@gmail.com wrote: 1. For ensuring high availability I would like to install one Cassandra cluster on one availability zone (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2 US-west). One cluster, replication factor of 2, cluster configured with a rack aware snitch is how this is usually done. Well, more accurately, people usually deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is also required to use QUORUM Consistency Level. If you will always operate only out of EC2, you probably want to look into the EC2Snitch. If you plan to ultimately go multi-region, EC2MultiRegionSnitch. If maybe hybrid in the future, GossipingPropertyFileSnitch. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html For some good meta on the internals here : https://issues.apache.org/jira/browse/CASSANDRA-3810 =Rob http://twitter.com/rcolidba
Data synchronization between 2 running clusters on different availability zone
Hello! I have the following scenario: 1. For ensuring high availability I would like to install one Cassandra cluster on one availability zone (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2 US-west). 2.I have pipeline that is running on Amazon EC2-EAST and is feeding the Cassandra installed on this AZ. Here are my questions: 1. Is this scenario feasible? 2. Is the architecture correct regarding the availability of Cassandra? 3. If the architecture is fine, how do you keep data synchronized between the two instances? I look forward for your answers. Regards, Florin
Re: Data synchronization between 2 running clusters on different availability zone
On Tue, Nov 25, 2014 at 7:09 AM, Spico Florin spicoflo...@gmail.com wrote: 1. For ensuring high availability I would like to install one Cassandra cluster on one availability zone (on Amazon EC2 US-east) and one Cassandra cluster on other AZ (Amazon EC2 US-west). One cluster, replication factor of 2, cluster configured with a rack aware snitch is how this is usually done. Well, more accurately, people usually deploy with at least RF=3 and across 3 AZs. A RF of at least 3 is also required to use QUORUM Consistency Level. If you will always operate only out of EC2, you probably want to look into the EC2Snitch. If you plan to ultimately go multi-region, EC2MultiRegionSnitch. If maybe hybrid in the future, GossipingPropertyFileSnitch. http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2_t.html http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchEC2MultiRegion_c.html http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architectureSnitchGossipPF_c.html For some good meta on the internals here : https://issues.apache.org/jira/browse/CASSANDRA-3810 =Rob http://twitter.com/rcolidba