Re: nodetool repair -pr
It depends on your data model. -pr only repair primary range. So if there is a keyspace with replication 'DC2:3', and you run repair -pr only on all nodes of DC1, it is not going to repair token ranges corsponding to DC2. So you will have to run on each node. -Arvinder On Fri, Jun 8, 2018, 8:42 PM Igor Zubchenok wrote: > According docs at > http://cassandra.apache.org/doc/latest/operating/repair.html?highlight=single > > > *The -pr flag will only repair the “primary” ranges on a node, so you can > repair your entire cluster by running nodetool repair -pr on each node in > a single datacenter.* > But I saw many places, where it is noted that I should run it at ALL data > centers. > > Looking for a qualified answer. > > > On Fri, 8 Jun 2018 at 18:08 Igor Zubchenok wrote: > >> I want to repair all nodes at all data centers. >> >> Example: >> DC1 >> nodeA >> nodeB >> nodeC >> DC2 >> node D >> node E >> node F >> >> If I run `nodetool repair -pr` at nodeA nodeB and nodeC, will all ranges >> be repaired? >> >> >> On Fri, 8 Jun 2018 at 17:57 Rahul Singh >> wrote: >> >>> From DS dox : "Do not use -pr with this option to repair only a local >>> data center." >>> On Jun 8, 2018, 10:42 AM -0400, user@cassandra.apache.org, wrote: >>> >>> >>> *nodetool repair -pr* >>> >>>
Re: nodetool repair -pr
According docs at http://cassandra.apache.org/doc/latest/operating/repair.html?highlight=single *The -pr flag will only repair the “primary” ranges on a node, so you can repair your entire cluster by running nodetool repair -pr on each node in a single datacenter.* But I saw many places, where it is noted that I should run it at ALL data centers. Looking for a qualified answer. On Fri, 8 Jun 2018 at 18:08 Igor Zubchenok wrote: > I want to repair all nodes at all data centers. > > Example: > DC1 > nodeA > nodeB > nodeC > DC2 > node D > node E > node F > > If I run `nodetool repair -pr` at nodeA nodeB and nodeC, will all ranges > be repaired? > > > On Fri, 8 Jun 2018 at 17:57 Rahul Singh > wrote: > >> From DS dox : "Do not use -pr with this option to repair only a local >> data center." >> On Jun 8, 2018, 10:42 AM -0400, user@cassandra.apache.org, wrote: >> >> >> *nodetool repair -pr* >> >>
Re: nodetool repair -pr
I want to repair all nodes at all data centers. Example: DC1 nodeA nodeB nodeC DC2 node D node E node F If I run `nodetool repair -pr` at nodeA nodeB and nodeC, will all ranges be repaired? On Fri, 8 Jun 2018 at 17:57 Rahul Singh wrote: > From DS dox : "Do not use -pr with this option to repair only a local > data center." > On Jun 8, 2018, 10:42 AM -0400, user@cassandra.apache.org, wrote: > > > *nodetool repair -pr* > > -- Regards, Igor Zubchenok CTO at Multi Brains LLC Founder of taxistartup.com saytaxi.com chauffy.com Skype: igor.zubchenok
Re: nodetool repair -pr
>From DS dox : "Do not use -pr with this option to repair only a local data >center." On Jun 8, 2018, 10:42 AM -0400, user@cassandra.apache.org, wrote: > > nodetool repair -pr
Re: Nodetool repair -pr
It will on 2.2 and higher, yes. Also, just want to point out that it would be worth it for you to compare how long incremental repairs take vs full repairs in your cluster. There are some problems (which are fixed in 4.0) that can cause significant overstreaming when using incremental repair. On September 28, 2017 at 11:46:47 AM, Dmitry Buzolin (dbuz5ga...@gmail.com) wrote: Hi All, Can someone confirm if "nodetool repair -pr -j2" does run with -inc too? I see the docs mention -inc is set by default, but I am not sure if it is enabled when -pr option is used. Thanks! - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
RE: nodetool repair -pr enough in this scenario?
Understand simple mechanics first, decide how to act later. Without -PR there's no difference from which host to run repair, it runs for the whole 100% range, from start to end, the whole cluster, all nodes, at once. With -PR it runs only for a primary range of a node you are running a repair. Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) N1=C-A, N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC aware. So running repair with -PR on node N2 will only repair a range A-B, for which node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range one with other. For other ranges you need to run on other nodes. Without -PR running on any node will repair all ranges, A-B, B-C, C-A. A node you run a repair without -PR is just a repair coordinator, so no difference, which one will be next time. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider What is Adform: watch this short videohttp://vimeo.com/adform/display [Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: David Daeschler [mailto:david.daesch...@gmail.com] Sent: Tuesday, June 05, 2012 08:59 To: user@cassandra.apache.org Subject: nodetool repair -pr enough in this scenario? Hello, Currently I have a 4 node cassandra cluster on CentOS64. I have been running nodetool repair (no -pr option) on a weekly schedule like: Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri In this scenario, if I were to add the -pr option, would this still be sufficient to prevent forgotten deletes and properly maintain consistency? Thank you, - David inline: signature-logo29.png
Re: nodetool repair -pr enough in this scenario?
In your case -pr would be just fine (see Viktor's explanation). 2012/6/5 Viktor Jevdokimov viktor.jevdoki...@adform.com Understand simple mechanics first, decide how to act later. ** ** Without –PR there’s no difference from which host to run repair, it runs for the whole 100% range, from start to end, the whole cluster, all nodes, at once. ** ** With –PR it runs only for a primary range of a node you are running a repair. Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) N1=C-A, N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC aware. So running repair with –PR on node N2 will only repair a range A-B, for which node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range one with other. For other ranges you need to run on other nodes. ** ** Without –PR running on any node will repair all ranges, A-B, B-C, C-A. A node you run a repair without –PR is just a repair coordinator, so no difference, which one will be next time. ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* David Daeschler [mailto:david.daesch...@gmail.com] *Sent:* Tuesday, June 05, 2012 08:59 *To:* user@cassandra.apache.org *Subject:* nodetool repair -pr enough in this scenario? ** ** Hello, ** ** Currently I have a 4 node cassandra cluster on CentOS64. I have been running nodetool repair (no -pr option) on a weekly schedule like: ** ** Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri ** ** In this scenario, if I were to add the -pr option, would this still be sufficient to prevent forgotten deletes and properly maintain consistency? ** ** Thank you, - David -- With kind regards, Robin Verlangen *Software engineer* * * W http://www.robinverlangen.nl E ro...@us2.nl Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. signature-logo29.png
Re: nodetool repair -pr enough in this scenario?
On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Understand simple mechanics first, decide how to act later. ** ** Without –PR there’s no difference from which host to run repair, it runs for the whole 100% range, from start to end, the whole cluster, all nodes, at once. That's not exactly true. A repair without -pr will repair all the ranges of the node on which repair is ran. So it will only repair the ranges that the node is a replica for. It will *not* repair the whole cluster (unless the replication factor is equal to the number of nodes in the cluster but that's a degenerate case). And hence it does matter on which host repair is run (it always matter, whether you use -pr or not). In general you want to use repair without -pr in case where you want to repair a specific node. Typically, if a node was dead for a reasonably long time, you may want to run a repair (without -pr) on that specific node to have him catch up faster (faster that if you were only relying on read-repair and hinted-handoff). For repairing a whole cluster, as is the case for the weekly scheduled repairs in the initial question, you want to use -rp. You *do not* want to use repair without -pr in that case. You do not because for that task using -pr is more efficient (and to be clear, not using -pr won't cause problems, but it does is less efficient). -- Sylvain With –PR it runs only for a primary range of a node you are running a repair. Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) N1=C-A, N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC aware. So running repair with –PR on node N2 will only repair a range A-B, for which node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range one with other. For other ranges you need to run on other nodes. ** ** Without –PR running on any node will repair all ranges, A-B, B-C, C-A. A node you run a repair without –PR is just a repair coordinator, so no difference, which one will be next time. ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider What is Adform: watch this short video http://vimeo.com/adform/display [image: Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* David Daeschler [mailto:david.daesch...@gmail.com] *Sent:* Tuesday, June 05, 2012 08:59 *To:* user@cassandra.apache.org *Subject:* nodetool repair -pr enough in this scenario? ** ** Hello, ** ** Currently I have a 4 node cassandra cluster on CentOS64. I have been running nodetool repair (no -pr option) on a weekly schedule like: ** ** Host1: Tue, Host2: Wed, Host3: Thu, Host4: Fri ** ** In this scenario, if I were to add the -pr option, would this still be sufficient to prevent forgotten deletes and properly maintain consistency? ** ** Thank you, - David signature-logo29.png
RE: nodetool repair -pr enough in this scenario?
But in any case, repair is a two way process? I mean that repair without -PR on node N1 will repair N1 and N2 and N3, because N2 is a replica of N1 range and N1 is a replica of N3 range? And if there're more ranges, that not belongs to N1, that ranges and nodes will not be repaired? Am I understood correctly, that repair with or without -PR is not a repair selected node process, but synchronize data range(s) between replicas process? Single DC scenario: With -PR: synchronize data for only primary data range of selected node between all nodes for that range (max number of nodes for the range = RF). Without -PR: synchronize data for all data ranges of selected node (primary and replica) between all nodes of that ranges (max number of nodes for the ranges = RF*RF). Not effective since ranges overlaps, the same ranges will be synchronized more than once (max = RF times). Multiple DC with 100% data range in each DC scenario: the same, only RF = sum of RF from all DC's. Is that correct? Finally - is this process for SSTables only, excluding memtables and hints? Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider What is Adform: watch this short videohttp://vimeo.com/adform/display [Adform News] http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Tuesday, June 05, 2012 11:02 To: user@cassandra.apache.org Subject: Re: nodetool repair -pr enough in this scenario? On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com wrote: Understand simple mechanics first, decide how to act later. Without -PR there's no difference from which host to run repair, it runs for the whole 100% range, from start to end, the whole cluster, all nodes, at once. That's not exactly true. A repair without -pr will repair all the ranges of the node on which repair is ran. So it will only repair the ranges that the node is a replica for. It will *not* repair the whole cluster (unless the replication factor is equal to the number of nodes in the cluster but that's a degenerate case). And hence it does matter on which host repair is run (it always matter, whether you use -pr or not). In general you want to use repair without -pr in case where you want to repair a specific node. Typically, if a node was dead for a reasonably long time, you may want to run a repair (without -pr) on that specific node to have him catch up faster (faster that if you were only relying on read-repair and hinted-handoff). For repairing a whole cluster, as is the case for the weekly scheduled repairs in the initial question, you want to use -rp. You *do not* want to use repair without -pr in that case. You do not because for that task using -pr is more efficient (and to be clear, not using -pr won't cause problems, but it does is less efficient). -- Sylvain With -PR it runs only for a primary range of a node you are running a repair. Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) N1=C-A, N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC aware. So running repair with -PR on node N2 will only repair a range A-B, for which node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range one with other. For other ranges you need to run on other nodes. Without -PR running on any node will repair all ranges, A-B, B-C, C-A. A node you run a repair without -PR is just a repair coordinator, so no difference, which one will be next time. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063tel:%2B370%205%20212%203063, Fax +370 5 261 0453tel:%2B370%205%20261%200453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider What is Adform: watch this short videohttp://vimeo.com/adform/display [Adform News]http://www.adform.com Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must
Re: nodetool repair -pr enough in this scenario?
-pr is a new feature added in 1.0. It was added for efficiency, not functionality. With -pr repair does 1/RF the work it does without it. Am I understood correctly, that “repair” with or without –PR is not a “repair selected node” process, but “synchronize data range(s) between replicas” process? Yes. But if you have a node that has been down for a few hours you may want to get it's primary range repaired quickly. Or as sylvain says, if you are running repair on every node in the cluster you can use -pr to reduce the duration of the repair operation. It would have the same effect as running repair without -pr on every RF'th node in the cluster. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 5/06/2012, at 9:19 PM, Viktor Jevdokimov wrote: But in any case, repair is a two way process? I mean that repair without –PR on node N1 will repair N1 and N2 and N3, because N2 is a replica of N1 range and N1 is a replica of N3 range? And if there’re more ranges, that not belongs to N1, that ranges and nodes will not be repaired? Am I understood correctly, that “repair” with or without –PR is not a “repair selected node” process, but “synchronize data range(s) between replicas” process? Single DC scenario: With –PR: synchronize data for only primary data range of selected node between all nodes for that range (max number of nodes for the range = RF). Without –PR: synchronize data for all data ranges of selected node (primary and replica) between all nodes of that ranges (max number of nodes for the ranges = RF*RF). Not effective since ranges overlaps, the same ranges will be synchronized more than once (max = RF times). Multiple DC with 100% data range in each DC scenario: the same, only RF = sum of RF from all DC’s. Is that correct? Finally – is this process for SSTables only, excluding memtables and hints? Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider What is Adform: watch this short video signature-logo29.png Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: Tuesday, June 05, 2012 11:02 To: user@cassandra.apache.org Subject: Re: nodetool repair -pr enough in this scenario? On Tue, Jun 5, 2012 at 8:44 AM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Understand simple mechanics first, decide how to act later. Without –PR there’s no difference from which host to run repair, it runs for the whole 100% range, from start to end, the whole cluster, all nodes, at once. That's not exactly true. A repair without -pr will repair all the ranges of the node on which repair is ran. So it will only repair the ranges that the node is a replica for. It will *not* repair the whole cluster (unless the replication factor is equal to the number of nodes in the cluster but that's a degenerate case). And hence it does matter on which host repair is run (it always matter, whether you use -pr or not). In general you want to use repair without -pr in case where you want to repair a specific node. Typically, if a node was dead for a reasonably long time, you may want to run a repair (without -pr) on that specific node to have him catch up faster (faster that if you were only relying on read-repair and hinted-handoff). For repairing a whole cluster, as is the case for the weekly scheduled repairs in the initial question, you want to use -rp. You *do not* want to use repair without -pr in that case. You do not because for that task using -pr is more efficient (and to be clear, not using -pr won't cause problems, but it does is less efficient). -- Sylvain With –PR it runs only for a primary range of a node you are running a repair. Let say you have simple ring of 3 nodes with RF=2 and ranges (per node) N1=C-A, N2=A-B, N3=B-C (node tokens are N1=A, N2=B, N3=C). No rack, no DC aware. So running repair with –PR on node N2 will only repair a range A-B, for which node N2 is a primary and N3 is a backup. N2 and N3 will synchronize A-B range one with other. For other ranges you need to run on other nodes. Without –PR running on any node will repair all ranges, A-B, B-C, C-A. A node you run a repair without –PR is just a repair coordinator, so
Re: nodetool repair -pr enough in this scenario?
Thank you for all the replies. It has been enlightening to read. I think I now have a better idea of repair, ranges, replicas and how the data is distributed. It also seems that using -pr would be the best way to go in my scenario with 1.x+ Thank you for all the feedback. Glad to see such an active community around Cassandra. - David