Re: Partition range incremental repairs

2017-06-19 Thread Chris Stokesmore
Anyone have anymore thoughts on this at all? Struggling to understand it..


> On 9 Jun 2017, at 11:32, Chris Stokesmore <chris.elsm...@demandlogic.co> 
> wrote:
> 
> Hi Anuj,
> 
> Thanks for the reply.
> 
> 1). We are using Cassandra 2.2.8, and our repair commands we are comparing 
> are 
> "nodetool repair --in-local-dc --partitioner-range” and 
> "nodetool repair --in-local-dc”
> Since 2.2 I believe inc repairs are the default - that seems to be confirmed 
> in the logs that list the repair details when a repair starts.
> 
> 2) From looks at a few runsr, on average:
> with -pr repairs, each node is approx 6.5 - 8 hours, so a total over the 7 
> nodes of 53 hours
> With just inc repairs, each node ~26 - 29 hours, so a total of 193
> 
> 3) we currently have two DCs in total, the ‘production’ ring with 7 nodes and 
> RF=3, and a testing ring with one single node and RF=1 for our single 
> keyspace we currently use.
> 
> 4) Yeah that number came from the Cassandra repair logs from an inc repair, I 
> can share the number reports when using a pr repair later this evening when 
> the currently running repair has completed.
> 
> 
> Many thanks for the reply again,
> 
> Chris
> 
> 
>> On 6 Jun 2017, at 17:50, Anuj Wadehra <anujw_2...@yahoo.co.in 
>> <mailto:anujw_2...@yahoo.co.in>> wrote:
>> 
>> Hi Chris,
>> 
>> Can your share following info:
>> 
>> 1. Exact repair commands you use for inc repair and pr repair
>> 
>> 2. Repair time should be measured at cluster level for inc repair. So, whats 
>> the total time it takes to run repair on all nodes for incremental vs pr 
>> repairs?
>> 
>> 3. You are repairing one dc DC3. How many DCs are there in total and whats 
>> the RF for keyspaces? Running pr on a specific dc would not repair entire 
>> data.
>> 
>> 4. 885 ranges? From where did you get this number? Logs? Can you share the 
>> number ranges printed in logs for both inc and pr case?
>> 
>> 
>> Thanks
>> Anuj
>> 
>> 
>> Sent from Yahoo Mail on Android 
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>> On Tue, Jun 6, 2017 at 9:33 PM, Chris Stokesmore
>> <chris.elsm...@demandlogic.co <mailto:chris.elsm...@demandlogic.co>> wrote:
>> Thank you for the excellent and clear description of the different versions 
>> of repair Anuj, that has cleared up what I expect to be happening.
>> 
>> The problem now is in our cluster, we are running repairs with options 
>> (parallelism: parallel, primary range: false, incremental: true, job 
>> threads: 1, ColumnFamilies: [], dataCenters: [DC3], hosts: [], # of ranges: 
>> 885) and when we do our repairs are taking over a day to complete when 
>> previously when running with the partition range option they were taking 
>> more like 8-9 hours.
>> 
>> As I understand it, using incremental should have sped this process up as 
>> all three sets of data on each repair job should be marked as repaired 
>> however this does not seem to be the case. Any ideas?
>> 
>> Chris
>> 
>>> On 6 Jun 2017, at 16:08, Anuj Wadehra <anujw_2...@yahoo.co.in.INVALID 
>>> <mailto:anujw_2...@yahoo.co.in.INVALID>> wrote:
>>> 
>>> Hi Chris,
>>> 
>>> Using pr with incremental repairs does not make sense. Primary range repair 
>>> is an optimization over full repair. If you run full repair on a n node 
>>> cluster with RF=3, you would be repairing each data thrice. 
>>> E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . 
>>> When full repair is run on node A, the entire data in that range gets 
>>> synced with replicas on node B and C. Now, when you run full repair on 
>>> nodes B and C, you are wasting resources on repairing data which is already 
>>> repaired. 
>>> 
>>> Primary range repair ensures that when you run repair on a node, it ONLY 
>>> repairs the data which is owned by the node. Thus, no node repairs data 
>>> which is not owned by it and must be repaired by other node. Redundant work 
>>> is eliminated. 
>>> 
>>> Even in pr, each time you run pr on all nodes, you repair 100% of data. Why 
>>> to repair complete data in each cycle?? ..even data which has not even 
>>> changed since the last repair cycle?
>>> 
>>> This is where Incremental repair comes as an improvement. Once repaired, a 
>>> data would be marked repaired so that the next repair cycle could just 
>>> focus on repairing the delta. Now, lets

Re: Partition range incremental repairs

2017-06-09 Thread Chris Stokesmore
> 
> I can't recommend *anyone* use incremental repair as there's some pretty 
> horrible bugs in it that can cause Merkle trees to wildly mismatch & result 
> in massive overstreaming.  Check out 
> https://issues.apache.org/jira/browse/CASSANDRA-9143 
> <https://issues.apache.org/jira/browse/CASSANDRA-9143>.  
> 
> TL;DR: Do not use incremental repair before 4.0.

Hi Jonathan,

Thanks for your reply, this is a slightly scary message for us! 2.2 has been 
out for nearly 2 years and incremental repairs are the default - and it has 
horrible bugs!?
I guess massive over streaming while a performance issue, does not affect data 
integrity..

Are there any plans to back port this to 3 or ideally 2.2 ?

Chris



> On Tue, Jun 6, 2017 at 9:54 AM Anuj Wadehra <anujw_2...@yahoo.co.in.invalid> 
> wrote:
> Hi Chris,
> 
> Can your share following info:
> 
> 1. Exact repair commands you use for inc repair and pr repair
> 
> 2. Repair time should be measured at cluster level for inc repair. So, whats 
> the total time it takes to run repair on all nodes for incremental vs pr 
> repairs?
> 
> 3. You are repairing one dc DC3. How many DCs are there in total and whats 
> the RF for keyspaces? Running pr on a specific dc would not repair entire 
> data.
> 
> 4. 885 ranges? From where did you get this number? Logs? Can you share the 
> number ranges printed in logs for both inc and pr case?
> 
> 
> Thanks
> Anuj
> 
> 
> Sent from Yahoo Mail on Android 
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> On Tue, Jun 6, 2017 at 9:33 PM, Chris Stokesmore
> <chris.elsm...@demandlogic.co <mailto:chris.elsm...@demandlogic.co>> wrote:
> Thank you for the excellent and clear description of the different versions 
> of repair Anuj, that has cleared up what I expect to be happening.
> 
> The problem now is in our cluster, we are running repairs with options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [DC3], hosts: [], # of ranges: 885) and 
> when we do our repairs are taking over a day to complete when previously when 
> running with the partition range option they were taking more like 8-9 hours.
> 
> As I understand it, using incremental should have sped this process up as all 
> three sets of data on each repair job should be marked as repaired however 
> this does not seem to be the case. Any ideas?
> 
> Chris
> 
>> On 6 Jun 2017, at 16:08, Anuj Wadehra <anujw_2...@yahoo.co.in.INVALID 
>> <mailto:anujw_2...@yahoo.co.in.INVALID>> wrote:
>> 
>> Hi Chris,
>> 
>> Using pr with incremental repairs does not make sense. Primary range repair 
>> is an optimization over full repair. If you run full repair on a n node 
>> cluster with RF=3, you would be repairing each data thrice. 
>> E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . 
>> When full repair is run on node A, the entire data in that range gets synced 
>> with replicas on node B and C. Now, when you run full repair on nodes B and 
>> C, you are wasting resources on repairing data which is already repaired. 
>> 
>> Primary range repair ensures that when you run repair on a node, it ONLY 
>> repairs the data which is owned by the node. Thus, no node repairs data 
>> which is not owned by it and must be repaired by other node. Redundant work 
>> is eliminated. 
>> 
>> Even in pr, each time you run pr on all nodes, you repair 100% of data. Why 
>> to repair complete data in each cycle?? ..even data which has not even 
>> changed since the last repair cycle?
>> 
>> This is where Incremental repair comes as an improvement. Once repaired, a 
>> data would be marked repaired so that the next repair cycle could just focus 
>> on repairing the delta. Now, lets go back to the example of 5 node cluster 
>> with RF =3.This time we run incremental repair on all nodes. When you repair 
>> entire data on node A, all 3 replicas are marked as repaired. Even if you 
>> run inc repair on all ranges on the second node, you would not re-repair the 
>> already repaired data. Thus, there is no advantage of repairing only the 
>> data owned by the node (primary range of the node). You can run inc repair 
>> on all the data present on a node and Cassandra would make sure that when 
>> you repair data on other nodes, you only repair unrepaired data.
>> 
>> Thanks
>> Anuj
>> 
>> 
>> 
>> Sent from Yahoo Mail on Android 
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>> On Tue, Jun 6, 2017 at 4:27 PM, Chris Stokesmore
>> <chris.elsm...@

Re: Partition range incremental repairs

2017-06-09 Thread Chris Stokesmore
Hi Anuj,

Thanks for the reply.

1). We are using Cassandra 2.2.8, and our repair commands we are comparing are 
"nodetool repair --in-local-dc --partitioner-range” and 
"nodetool repair --in-local-dc”
Since 2.2 I believe inc repairs are the default - that seems to be confirmed in 
the logs that list the repair details when a repair starts.

2) From looks at a few runsr, on average:
with -pr repairs, each node is approx 6.5 - 8 hours, so a total over the 7 
nodes of 53 hours
With just inc repairs, each node ~26 - 29 hours, so a total of 193

3) we currently have two DCs in total, the ‘production’ ring with 7 nodes and 
RF=3, and a testing ring with one single node and RF=1 for our single keyspace 
we currently use.

4) Yeah that number came from the Cassandra repair logs from an inc repair, I 
can share the number reports when using a pr repair later this evening when the 
currently running repair has completed.


Many thanks for the reply again,

Chris


> On 6 Jun 2017, at 17:50, Anuj Wadehra <anujw_2...@yahoo.co.in> wrote:
> 
> Hi Chris,
> 
> Can your share following info:
> 
> 1. Exact repair commands you use for inc repair and pr repair
> 
> 2. Repair time should be measured at cluster level for inc repair. So, whats 
> the total time it takes to run repair on all nodes for incremental vs pr 
> repairs?
> 
> 3. You are repairing one dc DC3. How many DCs are there in total and whats 
> the RF for keyspaces? Running pr on a specific dc would not repair entire 
> data.
> 
> 4. 885 ranges? From where did you get this number? Logs? Can you share the 
> number ranges printed in logs for both inc and pr case?
> 
> 
> Thanks
> Anuj
> 
> 
> Sent from Yahoo Mail on Android 
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> On Tue, Jun 6, 2017 at 9:33 PM, Chris Stokesmore
> <chris.elsm...@demandlogic.co> wrote:
> Thank you for the excellent and clear description of the different versions 
> of repair Anuj, that has cleared up what I expect to be happening.
> 
> The problem now is in our cluster, we are running repairs with options 
> (parallelism: parallel, primary range: false, incremental: true, job threads: 
> 1, ColumnFamilies: [], dataCenters: [DC3], hosts: [], # of ranges: 885) and 
> when we do our repairs are taking over a day to complete when previously when 
> running with the partition range option they were taking more like 8-9 hours.
> 
> As I understand it, using incremental should have sped this process up as all 
> three sets of data on each repair job should be marked as repaired however 
> this does not seem to be the case. Any ideas?
> 
> Chris
> 
>> On 6 Jun 2017, at 16:08, Anuj Wadehra <anujw_2...@yahoo.co.in.INVALID 
>> <mailto:anujw_2...@yahoo.co.in.INVALID>> wrote:
>> 
>> Hi Chris,
>> 
>> Using pr with incremental repairs does not make sense. Primary range repair 
>> is an optimization over full repair. If you run full repair on a n node 
>> cluster with RF=3, you would be repairing each data thrice. 
>> E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . 
>> When full repair is run on node A, the entire data in that range gets synced 
>> with replicas on node B and C. Now, when you run full repair on nodes B and 
>> C, you are wasting resources on repairing data which is already repaired. 
>> 
>> Primary range repair ensures that when you run repair on a node, it ONLY 
>> repairs the data which is owned by the node. Thus, no node repairs data 
>> which is not owned by it and must be repaired by other node. Redundant work 
>> is eliminated. 
>> 
>> Even in pr, each time you run pr on all nodes, you repair 100% of data. Why 
>> to repair complete data in each cycle?? ..even data which has not even 
>> changed since the last repair cycle?
>> 
>> This is where Incremental repair comes as an improvement. Once repaired, a 
>> data would be marked repaired so that the next repair cycle could just focus 
>> on repairing the delta. Now, lets go back to the example of 5 node cluster 
>> with RF =3.This time we run incremental repair on all nodes. When you repair 
>> entire data on node A, all 3 replicas are marked as repaired. Even if you 
>> run inc repair on all ranges on the second node, you would not re-repair the 
>> already repaired data. Thus, there is no advantage of repairing only the 
>> data owned by the node (primary range of the node). You can run inc repair 
>> on all the data present on a node and Cassandra would make sure that when 
>> you repair data on other nodes, you only repair unrepaired data.
>> 
>> Thanks
>> Anuj
>> 
>> 
>> 
>> Sent from Ya

Re: Partition range incremental repairs

2017-06-06 Thread Chris Stokesmore
Thank you for the excellent and clear description of the different versions of 
repair Anuj, that has cleared up what I expect to be happening.

The problem now is in our cluster, we are running repairs with options 
(parallelism: parallel, primary range: false, incremental: true, job threads: 
1, ColumnFamilies: [], dataCenters: [DC3], hosts: [], # of ranges: 885) and 
when we do our repairs are taking over a day to complete when previously when 
running with the partition range option they were taking more like 8-9 hours.

As I understand it, using incremental should have sped this process up as all 
three sets of data on each repair job should be marked as repaired however this 
does not seem to be the case. Any ideas?

Chris

> On 6 Jun 2017, at 16:08, Anuj Wadehra <anujw_2...@yahoo.co.in.INVALID> wrote:
> 
> Hi Chris,
> 
> Using pr with incremental repairs does not make sense. Primary range repair 
> is an optimization over full repair. If you run full repair on a n node 
> cluster with RF=3, you would be repairing each data thrice. 
> E.g. in a 5 node cluster with RF=3, a range may exist on node A,B and C . 
> When full repair is run on node A, the entire data in that range gets synced 
> with replicas on node B and C. Now, when you run full repair on nodes B and 
> C, you are wasting resources on repairing data which is already repaired. 
> 
> Primary range repair ensures that when you run repair on a node, it ONLY 
> repairs the data which is owned by the node. Thus, no node repairs data which 
> is not owned by it and must be repaired by other node. Redundant work is 
> eliminated. 
> 
> Even in pr, each time you run pr on all nodes, you repair 100% of data. Why 
> to repair complete data in each cycle?? ..even data which has not even 
> changed since the last repair cycle?
> 
> This is where Incremental repair comes as an improvement. Once repaired, a 
> data would be marked repaired so that the next repair cycle could just focus 
> on repairing the delta. Now, lets go back to the example of 5 node cluster 
> with RF =3.This time we run incremental repair on all nodes. When you repair 
> entire data on node A, all 3 replicas are marked as repaired. Even if you run 
> inc repair on all ranges on the second node, you would not re-repair the 
> already repaired data. Thus, there is no advantage of repairing only the data 
> owned by the node (primary range of the node). You can run inc repair on all 
> the data present on a node and Cassandra would make sure that when you repair 
> data on other nodes, you only repair unrepaired data.
> 
> Thanks
> Anuj
> 
> 
> 
> Sent from Yahoo Mail on Android 
> <https://overview.mail.yahoo.com/mobile/?.src=Android>
> On Tue, Jun 6, 2017 at 4:27 PM, Chris Stokesmore
> <chris.elsm...@demandlogic.co> wrote:
> Hi all,
> 
> Wondering if anyone had any thoughts on this? At the moment the long running 
> repairs cause us to be running them on two nodes at once for a bit of time, 
> which obivould increases the cluster load.
> 
> On 2017-05-25 16:18 (+0100), Chris Stokesmore <c...@demandlogic.co 
> <mailto:c...@demandlogic.co>> wrote: 
> > Hi,> 
> > 
> > We are running a 7 node Cassandra 2.2.8 cluster, RF=3, and had been running 
> > repairs with the -pr option, via a cron job that runs on each node once per 
> > week.> 
> > 
> > We changed that as some advice on the Cassandra IRC channel said it would 
> > cause more anticompaction and  
> > http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html
> >   
> > <http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html>says
> >  'Performing partitioner range repairs by using the -pr option is generally 
> > considered a good choice for doing manual repairs. However, this option 
> > cannot be used with incremental repairs (default for Cassandra 2.2 and 
> > later)'
> > 
> > Only problem is our -pr repairs were taking about 8 hours, and now the 
> > non-pr repair are taking 24+ - I guess this makes sense, repairing 1/7 of 
> > data increased to 3/7, except I was hoping to see a speed up after the 
> > first loop through the cluster as each repair will be marking much more 
> > data as repaired, right?> 
> > 
> > 
> > Is running -pr with incremental repairs really that bad? > 
> -
> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org 
> <mailto:user-unsubscr...@cassandra.apache.org>
> For additional commands, e-mail: user-h...@cassandra.apache.org 
> <mailto:user-h...@cassandra.apache.org>



Re: Partition range incremental repairs

2017-06-06 Thread Chris Stokesmore
Hi all,

Wondering if anyone had any thoughts on this? At the moment the long running 
repairs cause us to be running them on two nodes at once for a bit of time, 
which obivould increases the cluster load.

On 2017-05-25 16:18 (+0100), Chris Stokesmore <c...@demandlogic.co> wrote: 
> Hi,> 
> 
> We are running a 7 node Cassandra 2.2.8 cluster, RF=3, and had been running 
> repairs with the -pr option, via a cron job that runs on each node once per 
> week.> 
> 
> We changed that as some advice on the Cassandra IRC channel said it would 
> cause more anticompaction and  
> http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html
>   says 'Performing partitioner range repairs by using the -pr option is 
> generally considered a good choice for doing manual repairs. However, this 
> option cannot be used with incremental repairs (default for Cassandra 2.2 and 
> later)'
> 
> Only problem is our -pr repairs were taking about 8 hours, and now the non-pr 
> repair are taking 24+ - I guess this makes sense, repairing 1/7 of data 
> increased to 3/7, except I was hoping to see a speed up after the first loop 
> through the cluster as each repair will be marking much more data as 
> repaired, right?> 
> 
> 
> Is running -pr with incremental repairs really that bad? > 
-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Partition range incremental repairs

2017-05-25 Thread Chris Stokesmore
Hi,

We are running a 7 node Cassandra 2.2.8 cluster, RF=3, and had been running 
repairs with the —pr option, via a cron job that runs on each node once per 
week.

We changed that as some advice on the Cassandra IRC channel said it would cause 
more anticompaction and  
http://docs.datastax.com/en/archived/cassandra/2.2/cassandra/tools/toolsRepair.html
 

 says “Performing partitioner range repairs by using the -pr option is 
generally considered a good choice for doing manual repairs. However, this 
option cannot be used with incremental repairs (default for Cassandra 2.2 and 
later).

Only problem is our -pr repairs were taking about 8 hours, and now the non-pr 
repair are taking 24+ - I guess this makes sense, repairing 1/7 of data 
increased to 3/7, except I was hoping to see a speed up after the first loop 
through the cluster as each repair will be marking much more data as repaired, 
right?


Is running -pr with incremental repairs really that bad?