RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

Anubhav Kale Wed, 12 Oct 2016 09:41:01 -0700

Agree.

However, if we go from a world where repairs don’t run (or run very unreliably 
so C* can’t mark the SSTables as repaired anyways) to a world where repairs run 
more reliably (Spark / Tickler approach) – the impact on tombstone removal 
doesn’t become any worse (because SS Tables aren’t marked either ways).

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Wednesday, October 12, 2016 9:25 AM
To: user@cassandra.apache.org
Subject: Re: Repair in Multi Datacenter - Should you use -dc Datacenter repair 
or repair with -pr

Note that the tickle approach doesn’t mark sstables as repaired (it’s a simpler 
version of mutation based repair in a sense), so Cassandra has no way to prove 
that the data has been repaired.

With tickets like https://issues.apache.org/jira/browse/CASSANDRA-6434, this 
has implications on tombstone removal.

From: Anubhav Kale 
<anubhav.k...@microsoft.com<mailto:anubhav.k...@microsoft.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Date: Wednesday, October 12, 2016 at 9:17 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
<user@cassandra.apache.org<mailto:user@cassandra.apache.org>>
Subject: RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair 
or repair with -pr

The default repair process doesn’t usually work at scale, unfortunately.

Depending on your data size, you have the following options.

Netflix Tickler: 
https://github.com/ckalantzis/cassTickler<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_ckalantzis_cassTickler&d=DQMFAg&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=2YcSoi47BW6-V4BVz980x1Jr7cFbVwc8arJP3Qs4M-0&s=SIf2vucsd5X4ox-awetoQaxhIO5n3U3b4XzCTiCHT1g&e=>
 (Read at CL.ALL via CQL continuously :: Python)

Spotify Reaper: 
https://github.com/spotify/cassandra-reaper<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_spotify_cassandra-2Dreaper&d=DQMFAg&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=2YcSoi47BW6-V4BVz980x1Jr7cFbVwc8arJP3Qs4M-0&s=PMkQdggR0dnPHGJ8d7mY-vxxyitPWSlgSdFiLVOm8lA&e=>
 (Subrange repair, provides a REST endpoint and calls APIs through JMX :: Java)

List subranges: 
https://github.com/pauloricardomg/cassandra-list-subranges<https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_pauloricardomg_cassandra-2Dlist-2Dsubranges&d=DQMFAg&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=2YcSoi47BW6-V4BVz980x1Jr7cFbVwc8arJP3Qs4M-0&s=f7n9PVE3EeDZMk2I2LhX9MnpPWV7yTGUfPKwImjIxZU&e=>
 (Tool to get subranges for a given node. :: Java)

Subrange Repair: 
https://github.com/BrianGallew/cassandra_range_repair<https://urldefense.proofpoint.com/v2/url?u=https-3A__na01.safelinks.protection.outlook.com_-3Furl-3Dhttps-253A-252F-252Fgithub.com-252FBrianGallew-252Fcassandra-5Frange-5Frepair-26data-3D01-257C01-257CAnubhav.Kale-2540microsoft.com-257Cd8ed7c743f3a42ebac1808d3e94a97e4-257C72f988bf86f141af91ab2d7cd011db47-257C1-26sdata-3DrnOdSYfxRuV0RiXnI9HcLB220StFRDXSCMdoOQKcfvE-253D-26reserved-3D0&d=DQMFAg&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=2YcSoi47BW6-V4BVz980x1Jr7cFbVwc8arJP3Qs4M-0&s=9pPoqSUhM0LtWSO_nhHuqqtY9kvhMaoPIcg4PfFLGx0&e=>
 (Tool to subrange repair :: Python)

Mutation Based Repair (Not ready yet): 
https://issues.apache.org/jira/browse/CASSANDRA-8911<https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_CASSANDRA-2D8911&d=DQMFAg&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=2YcSoi47BW6-V4BVz980x1Jr7cFbVwc8arJP3Qs4M-0&s=sodsKAWrUPXZ3YUR_rx2DKzeq6N6grWEhbr-JknNU0Y&e=>
 (C* is thinking of doing this - hot off the press)

If you have Spark in your system, you could use that to do what Netflix Tickler 
does. We’re experimenting with it and seems to be the best fit for our datasets 
over all the other options.

From: Leena Ghatpande [mailto:lghatpa...@hotmail.com]
Sent: Wednesday, October 12, 2016 7:16 AM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Repair in Multi Datacenter - Should you use -dc Datacenter repair or 
repair with -pr

Please advice. Cannot find any clear documentation on what is the best strategy 
for repairing nodes on a regular basis with multiple datacenters involved.

We are running cassandra 3.7 in multi datacenter with 4 nodes in each data 
center. We are trying to run repairs every other night to keep the nodes in 
good state.We currently run repair with -pr option , but the repair process 
gets hung and does not complete gracefully. Dont see any errors in the logs 
either.

What is the best way to perform repairs on multiple data centers on large 
tables.

1. Can we run Datacenter repair using -dc option for each data center? Do we 
need to run repair on each node in that case or will it repair all nodes within 
the datacenter?

2. Is running repair with -pr across all nodes required , if we perform the 
step 1 every night?

3. Is cross data center repair required and if so whats the best option?

Thanks

Leena

____________________________________________________________________
CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and may 
be legally privileged. If you are not the intended recipient, do not disclose, 
copy, distribute, or use this email or any attachments. If you have received 
this in error please let the sender know and then delete the email and all 
attachments.

RE: Repair in Multi Datacenter - Should you use -dc Datacenter repair or repair with -pr

Reply via email to