Re: Nodetool Repair questions

2014-08-12 Thread Mark Reddy
Hi Vish,

1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?


Even if you never delete data you should run repairs occasionally to ensure
overall consistency. While hinted handoffs and read repairs do lead to
better consistency, they are only helpers/optimization and are not regarded
as operations that ensure consistency.

2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ?


Its difficult to calculate the performance of a repair, I've seen the time
to completion fluctuate between 4hrs to 10hrs+ on the same node. However in
theory adding more nodes would spread the data and free up machine
resources, thus resulting in more performant repairs.

The essence of this question is: If I have a keyspace with x number of
 replicas in each data center, do I have to deal with an upper limit on the
 number of data centers/nodes?


Could you expand on why you believe there would be an upper limit of
dc/nodes due to running repairs?


Mark


On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran 
vish.ramachand...@gmail.com wrote:

 Some questions on nodetool repair.

 1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?

 2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ? The essence of this question is: If I have a
 keyspace with x number of replicas in each data center, do I have to deal
 with an upper limit on the number of data centers/nodes?


 Thanks

 Vish



Re: Nodetool Repair questions

2014-08-12 Thread Andrey Ilinykh
1. You don't have to repair if you use QUORUM consistency and you don't
delete data.
2.Performance depends on size of data each node has. It's very difficult to
predict. It may take days.

Thank you,
  Andrey


On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran 
vish.ramachand...@gmail.com wrote:

 Some questions on nodetool repair.

 1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?

 2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ? The essence of this question is: If I have a
 keyspace with x number of replicas in each data center, do I have to deal
 with an upper limit on the number of data centers/nodes?


 Thanks

 Vish



Re: Nodetool Repair questions

2014-08-12 Thread Viswanathan Ramachandran
Thanks Mark,
Since we have replicas in each data center, addition of a new data center
(and new replicas) has a performance implication on nodetool repair.
I do understand that adding nodes without increasing number of replicas may
improve repair performance, but in this case we are adding new data center
and additional replicas which is an added overhead on nodetool repair.
Hence the thinking that we may reach an upper limit which could be the
point when the nodetool repair costs are way too high.


On Tue, Aug 12, 2014 at 2:59 PM, Mark Reddy mark.re...@boxever.com wrote:

 Hi Vish,

 1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?


 Even if you never delete data you should run repairs occasionally to
 ensure overall consistency. While hinted handoffs and read repairs do lead
 to better consistency, they are only helpers/optimization and are not
 regarded as operations that ensure consistency.

 2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ?


 Its difficult to calculate the performance of a repair, I've seen the time
 to completion fluctuate between 4hrs to 10hrs+ on the same node. However in
 theory adding more nodes would spread the data and free up machine
 resources, thus resulting in more performant repairs.

 The essence of this question is: If I have a keyspace with x number of
 replicas in each data center, do I have to deal with an upper limit on the
 number of data centers/nodes?


 Could you expand on why you believe there would be an upper limit of
 dc/nodes due to running repairs?


 Mark


 On Tue, Aug 12, 2014 at 10:06 PM, Viswanathan Ramachandran 
 vish.ramachand...@gmail.com wrote:

  Some questions on nodetool repair.

 1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?

 2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ? The essence of this question is: If I have a
 keyspace with x number of replicas in each data center, do I have to deal
 with an upper limit on the number of data centers/nodes?


 Thanks

 Vish





Re: Nodetool Repair questions

2014-08-12 Thread Viswanathan Ramachandran
Andrey, QUORUM consistency and no deletes makes perfect sense.
I believe we could modify that to EACH_QUORUM or QUORUM consistency and no
deletes - isnt that right ?

Thanks


On Tue, Aug 12, 2014 at 3:10 PM, Andrey Ilinykh ailin...@gmail.com wrote:

 1. You don't have to repair if you use QUORUM consistency and you don't
 delete data.
 2.Performance depends on size of data each node has. It's very difficult
 to predict. It may take days.

 Thank you,
   Andrey



 On Tue, Aug 12, 2014 at 2:06 PM, Viswanathan Ramachandran 
 vish.ramachand...@gmail.com wrote:

 Some questions on nodetool repair.

 1. This tool repairs inconsistencies across replicas of the row. Since
 latest update always wins, I dont see inconsistencies other than ones
 resulting from the combination of deletes, tombstones, and crashed nodes.
 Technically, if data is never deleted from cassandra, then nodetool repair
 does not need to be run at all. Is this understanding correct? If wrong,
 can anyone provide other ways inconsistencies could occur?

 2. Want to understand the performance of 'nodetool repair' in a Cassandra
 multi data center setup. As we add nodes to the cluster in various data
 centers, does the performance of nodetool repair on each node increase
 linearly, or is it quadratic ? The essence of this question is: If I have a
 keyspace with x number of replicas in each data center, do I have to deal
 with an upper limit on the number of data centers/nodes?


 Thanks

 Vish





Re: Nodetool Repair questions

2014-08-12 Thread Andrey Ilinykh
On Tue, Aug 12, 2014 at 4:46 PM, Viswanathan Ramachandran 
vish.ramachand...@gmail.com wrote:

 Andrey, QUORUM consistency and no deletes makes perfect sense.
 I believe we could modify that to EACH_QUORUM or QUORUM consistency and no
 deletes - isnt that right?


 yes.