Re: Nodetool Repair questions
Hi Vish, 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? Even if you never delete data you should run repairs occasionally to ensure overall consistency. While hinted handoffs and read repairs do lead to better consistency, they are only helpers/optimization and are not regarded as operations that ensure consistency. 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? Its difficult to calculate the performance of a repair, I've seen the time to completion fluctuate between 4hrs to 10hrs+ on the same node. However in theory adding more nodes would spread the data and free up machine resources, thus resulting in more performant repairs. The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Could you expand on why you believe there would be an upper limit of dc/nodes due to running repairs? Mark On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran vish.ramachand...@gmail.com wrote: Some questions on nodetool repair. 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Thanks Vish
Re: Nodetool Repair questions
1. You don't have to repair if you use QUORUM consistency and you don't delete data. 2.Performance depends on size of data each node has. It's very difficult to predict. It may take days. Thank you, Andrey On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran vish.ramachand...@gmail.com wrote: Some questions on nodetool repair. 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Thanks Vish
Re: Nodetool Repair questions
Thanks Mark, Since we have replicas in each data center, addition of a new data center (and new replicas) has a performance implication on nodetool repair. I do understand that adding nodes without increasing number of replicas may improve repair performance, but in this case we are adding new data center and additional replicas which is an added overhead on nodetool repair. Hence the thinking that we may reach an upper limit which could be the point when the nodetool repair costs are way too high. On Tue, Aug 12, 2014 at 2:59 PM, Mark Reddy mark.re...@boxever.com wrote: Hi Vish, 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? Even if you never delete data you should run repairs occasionally to ensure overall consistency. While hinted handoffs and read repairs do lead to better consistency, they are only helpers/optimization and are not regarded as operations that ensure consistency. 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? Its difficult to calculate the performance of a repair, I've seen the time to completion fluctuate between 4hrs to 10hrs+ on the same node. However in theory adding more nodes would spread the data and free up machine resources, thus resulting in more performant repairs. The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Could you expand on why you believe there would be an upper limit of dc/nodes due to running repairs? Mark On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran vish.ramachand...@gmail.com wrote: Some questions on nodetool repair. 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Thanks Vish
Re: Nodetool Repair questions
Andrey, QUORUM consistency and no deletes makes perfect sense. I believe we could modify that to EACH_QUORUM or QUORUM consistency and no deletes - isnt that right ? Thanks On Tue, Aug 12, 2014 at 3:10 PM, Andrey Ilinykh ailin...@gmail.com wrote: 1. You don't have to repair if you use QUORUM consistency and you don't delete data. 2.Performance depends on size of data each node has. It's very difficult to predict. It may take days. Thank you, Andrey On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran vish.ramachand...@gmail.com wrote: Some questions on nodetool repair. 1. This tool repairs inconsistencies across replicas of the row. Since latest update always wins, I dont see inconsistencies other than ones resulting from the combination of deletes, tombstones, and crashed nodes. Technically, if data is never deleted from cassandra, then nodetool repair does not need to be run at all. Is this understanding correct? If wrong, can anyone provide other ways inconsistencies could occur? 2. Want to understand the performance of 'nodetool repair' in a Cassandra multi data center setup. As we add nodes to the cluster in various data centers, does the performance of nodetool repair on each node increase linearly, or is it quadratic ? The essence of this question is: If I have a keyspace with x number of replicas in each data center, do I have to deal with an upper limit on the number of data centers/nodes? Thanks Vish
Re: Nodetool Repair questions
On Tue, Aug 12, 2014 at 4:46 PM, Viswanathan Ramachandran vish.ramachand...@gmail.com wrote: Andrey, QUORUM consistency and no deletes makes perfect sense. I believe we could modify that to EACH_QUORUM or QUORUM consistency and no deletes - isnt that right? yes.