Writing to multiple tables
Hi, Are Cassandra Batch statements http://www.datastax.com/documentation/cql/3.1/cql/cql_reference/batch_r.html the recommended way for updating same information in multiple tables? For example if I have the following tables: person_by_dob person_by_ssn person_by_lastname Then addition/modification of person will result in three writes. Is BATCH the recommended way of updating all three tables at one go so that the information between the three tables is consistent ? In other words, is it an established cassandra usage pattern to use this BATCH feature for this purpose? Are there alternate approaches and recommendations? Thanks Vish
Consistency Level for Atomic Batches
Is consistency level honored for batch statements? If I have 100 insert/update statements in my batch and use LOCAL_QUORUM consistency, will the control from coordinator return only after a local quorum update has been done for all the 100 statements? Or is it different ? Thanks Vish
Re: Consistency Level for Atomic Batches
A follow up on the earlier question. I meant to ask earlier if control returns to client after batch log is written on coordinator irrespective of consistency level mentioned. Also: will the coordinator attempt all statements one after the other, or in parallel ? Thanks On Tue, Sep 16, 2014 at 8:00 AM, Viswanathan Ramachandran vish.ramachand...@gmail.com wrote: Is consistency level honored for batch statements? If I have 100 insert/update statements in my batch and use LOCAL_QUORUM consistency, will the control from coordinator return only after a local quorum update has been done for all the 100 statements? Or is it different ? Thanks Vish
Re: Scala driver
I haven't used this scala driver, but may possibly explore soon. https://github.com/websudosuk/phantom https://websudosuk.github.io/phantom/ Mentioned at at: http://planetcassandra.org/client-drivers-tools/#Scala http://www.datastax.com/download#dl-community-drivers On Mon, Sep 1, 2014 at 10:14 PM, Gary Zhao garyz...@gmail.com wrote: Thanks Jan. I decided to use Java driver directly. It's not hard to use. On Sun, Aug 31, 2014 at 1:08 AM, Jan Algermissen jan.algermis...@nordsc.com wrote: Hi Gary, On 31 Aug 2014, at 07:19, Gary Zhao garyz...@gmail.com wrote: Hi Could you recommend a Scala driver and share your experiences of using it. Im thinking if i use java driver in Scala directly I am using Martin’s approach without any problems: https://github.com/magro/play2-scala-cassandra-sample The actual mapping from Java to Scala futures for the async case is in https://github.com/magro/play2-scala-cassandra-sample/blob/master/app/models/Utils.scala HTH, Jan Thanks
Re: LOCAL_QUORUM without a replica in current data center
Sorry for the spam - but I wanted to double check if anyone had experience with such a scenario. Thanks. On Sun, Aug 17, 2014 at 7:11 PM, Viswanathan Ramachandran vish.ramachand...@gmail.com wrote: Hi, How does LOCAL_QUORUM read/write behave when the data center on which query is executed does not have a replica of the keyspace? Does it result in an error or can it be configured to do LOCAL_QUORUM on the nearest data center (as depicted by the dynamic snitch) which has the replicas ? We are essentially trying to design a Cassandra cluster with a keyspace only in certain regional-hub data centers to keep number of replicas under control. I am curious to know if a cassandra node not in the regional-hub data center can handle LOCAL_QUORUM type operations, or if clients really need to have a connection to the hub data center with the replica to use that consistency level. Thanks Vish
LOCAL_QUORUM without a replica in current data center
Hi, How does LOCAL_QUORUM read/write behave when the data center on which query is executed does not have a replica of the keyspace? Does it result in an error or can it be configured to do LOCAL_QUORUM on the nearest data center (as depicted by the dynamic snitch) which has the replicas ? We are essentially trying to design a Cassandra cluster with a keyspace only in certain regional-hub data centers to keep number of replicas under control. I am curious to know if a cassandra node not in the regional-hub data center can handle LOCAL_QUORUM type operations, or if clients really need to have a connection to the hub data center with the replica to use that consistency level. Thanks Vish
Nodetool Repair questions
Some questions on nodetool repair. 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Thanks Vish
Re: Nodetool Repair questions
Thanks Mark, Since we have replicas in each data center, addition of a new data center (and new replicas) has a performance implication on nodetool repair. I do understand that adding nodes without increasing number of replicas may improve repair performance, but in this case we are adding new data center and additional replicas which is an added overhead on nodetool repair. Hence the thinking that we may reach an upper limit which could be the point when the nodetool repair costs are way too high. On Tue, Aug 12, 2014 at 2:59 PM, Mark Reddy mark.re...@boxever.com wrote: Hi Vish, 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? Even if you never delete data you should run repairs occasionally to ensure overall consistency. While hinted handoffs and read repairs do lead to better consistency, they are only helpers/optimization and are not regarded as operations that ensure consistency. 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? Its difficult to calculate the performance of a repair, I've seen the time to completion fluctuate between 4hrs to 10hrs+ on the same node. However in theory adding more nodes would spread the data and free up machine resources, thus resulting in more performant repairs. The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Could you expand on why you believe there would be an upper limit of dc/nodes due to running repairs? Mark On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran vish.ramachand...@gmail.com wrote: Some questions on nodetool repair. 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Thanks Vish
Re: Nodetool Repair questions
Andrey, QUORUM consistency and no deletes makes perfect sense. I believe we could modify that to EACH_QUORUM or QUORUM consistency and no deletes - isnt that right ? Thanks On Tue, Aug 12, 2014 at 3:10 PM, Andrey Ilinykh ailin...@gmail.com wrote: 1. You don't have to repair if you use QUORUM consistency and you don't delete data. 2.Performance depends on size of data each node has. It's very difficult to predict. It may take days. Thank you, Andrey On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran vish.ramachand...@gmail.com wrote: Some questions on nodetool repair. 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Thanks Vish
Is nodetool cleanup necessary on nodes of different data center when new node is added
I plan to have a multi data center Cassandra 2 setup with 2-4 nodes per data center and several 10s of data centers. We have keyspaces replicated on a certain number of nodes on *each* data center. Essentially, each data center has a logical ring that covers all token ranges. We have a vnode based deployment. So tokens should get assigned to the nodes automatically. Documentation at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_add_node_to_cluster_t.html suggests that addition of new node requires cleanup to be run on all other nodes of the cluster. However, it does not clarify the procedure in a multi-data center setup. My understanding is that nodetool cleanup removes data which no longer belongs to that node. When a new data center is being setup, we are creating completely new replicas and AFAICT, it does not result in data movement/rebalance outside of this new data center and hence there is no cleanup requirement on nodes of other data centers. Is someone able to confirm if my understanding is right, and cleanup is not required on nodes of other data centers? Thanks Vish