[jira] [Updated] (CASSANDRA-19918) Automated Repair Inside Cassandra

Jaydeepkumar Chovatia (Jira) Fri, 07 Mar 2025 08:56:23 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-19918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jaydeepkumar Chovatia updated CASSANDRA-19918:
----------------------------------------------
    Description: 
h1. Motivation

Anti-entropy (Apache Cassandra repairs) is essential for every Apache Cassandra 
cluster to fix data inconsistencies. Frequent data deletions and downed nodes 
are common causes of data inconsistency. A few open-source orchestration 
solutions that trigger repair externally are available, as many large users 
have needed to figure out a scalable repair solution. However, multiple custom 
solutions have led to a lot of confusion in the community. Therefore, the 
repair activity, like Compaction, should be an integral part of Cassandra to 
call it a complete solution.

 

The proposal is to align one solution among the existing solutions and make it 
part of the core Cassandra. Here is the design for one of the solutions:

 

Inside Cassandra, there are multiple repairs we would have to schedule:

1) Full repair

2) Incremental Repair 

3) Paxos repair

 

The design of the scheduler should be capable of extending multiple repair 
categories with a minimal code change, and all repair types should progress 
automatically with minimal manual intervention. 

Migrating[[1|https://stackoverflow.com/questions/42182984/how-do-i-enable-incremental-repair-on-cassandra-2-1-13]]
 (and rollback) to/from incremental repair has been extremely challenging, 
especially in a large fleet. One of the design principles is to make it almost 
touchless from the operator’s point of view.
h1. The Scheduler

Keeping the above motivation in mind, this design embarks on our journey to 
have the repair orchestration inside Cassandra itself, which will repair the 
entire ring. 

A dedicated thread pool is assigned to the repair scheduler at a higher level. 
The repair scheduler inside Cassandra maintains a new replicated table under a 
distributed _system_distributed_ keyspace. This table maintains the repair 
history for all the nodes, such as when it was repaired the last time, etc. The 
scheduler will pick the node(s) that run the repair first and continue 
orchestration to ensure Every table and all of their token ranges are repaired. 
The algorithm can also run repairs simultaneously on multiple nodes and splits 
the token range into subranges with the necessary retry to handle transient 
failures. Over the period, the automatic repair has become so reliable that it 
runs as soon as we start a Cassandra cluster, like Compaction, and does not 
require manual intervention. 

Due to this fully automated repair scheduler inside Cassandra, there is no 
dependency on the control plane, significantly reducing our operational 
overhead.

*CEP:* 
[https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution]
h2. Detailed Design Doc 

[Automated Repair in 
Cassandra|https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0]
h2. PR (on 4.1.6) (Last active: Sep 2024)

Many folks currently are using 4.1.6 in production. Hence, the following PR on 
4.1.6 will make it easier for everybody to review the code, test, etc.  If the 
community decides to merge this CEP, then it will land on the _trunk_ as 
opposed to {_}4.1{_}.

[https://github.com/apache/cassandra/pull/3367/]
h2. PR (on {_}trunk{_}) (Last active: Sep 2024)

[https://github.com/apache/cassandra/pull/3598]

 
h2. Discussion over Slack

[[1]|https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619] 
[[2]|http://cassandra-repair-scheduling-cep37/]

  was:
h1. Motivation

Anti-entropy (Apache Cassandra repairs) is essential for every Apache Cassandra 
cluster to fix data inconsistencies. Frequent data deletions and downed nodes 
are common causes of data inconsistency. A few open-source orchestration 
solutions that trigger repair externally are available, as many large users 
have needed to figure out a scalable repair solution. However, multiple custom 
solutions have led to a lot of confusion in the community. Therefore, the 
repair activity, like Compaction, should be an integral part of Cassandra to 
call it a complete solution.

 

The proposal is to align one solution among the existing solutions and make it 
part of the core Cassandra. Here is the design for one of the solutions:

 

Inside Cassandra, there are multiple repairs we would have to schedule:

1) Full repair

2) Incremental Repair 

3) Paxos repair

 

The design of the scheduler should be capable of extending multiple repair 
categories with a minimal code change, and all repair types should progress 
automatically with minimal manual intervention. 

Migrating[[1|https://stackoverflow.com/questions/42182984/how-do-i-enable-incremental-repair-on-cassandra-2-1-13]]
 (and rollback) to/from incremental repair has been extremely challenging, 
especially in a large fleet. One of the design principles is to make it almost 
touchless from the operator’s point of view.
h1. The Scheduler

Keeping the above motivation in mind, this design embarks on our journey to 
have the repair orchestration inside Cassandra itself, which will repair the 
entire ring. 

A dedicated thread pool is assigned to the repair scheduler at a higher level. 
The repair scheduler inside Cassandra maintains a new replicated table under a 
distributed _system_distributed_ keyspace. This table maintains the repair 
history for all the nodes, such as when it was repaired the last time, etc. The 
scheduler will pick the node(s) that run the repair first and continue 
orchestration to ensure Every table and all of their token ranges are repaired. 
The algorithm can also run repairs simultaneously on multiple nodes and splits 
the token range into subranges with the necessary retry to handle transient 
failures. Over the period, the automatic repair has become so reliable that it 
runs as soon as we start a Cassandra cluster, like Compaction, and does not 
require manual intervention. 

Due to this fully automated repair scheduler inside Cassandra, there is no 
dependency on the control plane, significantly reducing our operational 
overhead.

*CEP:* 
[https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution]
h2. Detailed Design Doc 

[Automated Repair in 
Cassandra|https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0]
h2. PR (on 4.1.6) (Last active: Sep 2024)

Many folks currently are using 4.1.6 in production. Hence, the following PR on 
4.1.6 will make it easier for everybody to review the code, test, etc.  If the 
community decides to merge this CEP, then it will land on the _trunk_ as 
opposed to {_}4.1{_}.

[https://github.com/apache/cassandra/pull/3367/]
h2. PR (on {_}trunk{_}) (Last active: Sep 2024)

[https://github.com/apache/cassandra/pull/3598]
h2. PR (dtest) (Last active: Oct 2024)

[https://github.com/apache/cassandra-dtest/pull/270]
h2. Discussion over Slack

[[1]|https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619] 
[[2]|http://cassandra-repair-scheduling-cep37/]


> Automated Repair Inside Cassandra
> ---------------------------------
>
>                 Key: CASSANDRA-19918
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19918
>             Project: Apache Cassandra
>          Issue Type: Epic
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Normal
>
> h1. Motivation
> Anti-entropy (Apache Cassandra repairs) is essential for every Apache 
> Cassandra cluster to fix data inconsistencies. Frequent data deletions and 
> downed nodes are common causes of data inconsistency. A few open-source 
> orchestration solutions that trigger repair externally are available, as many 
> large users have needed to figure out a scalable repair solution. However, 
> multiple custom solutions have led to a lot of confusion in the community. 
> Therefore, the repair activity, like Compaction, should be an integral part 
> of Cassandra to call it a complete solution.
>  
> The proposal is to align one solution among the existing solutions and make 
> it part of the core Cassandra. Here is the design for one of the solutions:
>  
> Inside Cassandra, there are multiple repairs we would have to schedule:
> 1) Full repair
> 2) Incremental Repair 
> 3) Paxos repair
>  
> The design of the scheduler should be capable of extending multiple repair 
> categories with a minimal code change, and all repair types should progress 
> automatically with minimal manual intervention. 
> Migrating[[1|https://stackoverflow.com/questions/42182984/how-do-i-enable-incremental-repair-on-cassandra-2-1-13]]
>  (and rollback) to/from incremental repair has been extremely challenging, 
> especially in a large fleet. One of the design principles is to make it 
> almost touchless from the operator’s point of view.
> h1. The Scheduler
> Keeping the above motivation in mind, this design embarks on our journey to 
> have the repair orchestration inside Cassandra itself, which will repair the 
> entire ring. 
> A dedicated thread pool is assigned to the repair scheduler at a higher 
> level. The repair scheduler inside Cassandra maintains a new replicated table 
> under a distributed _system_distributed_ keyspace. This table maintains the 
> repair history for all the nodes, such as when it was repaired the last time, 
> etc. The scheduler will pick the node(s) that run the repair first and 
> continue orchestration to ensure Every table and all of their token ranges 
> are repaired. The algorithm can also run repairs simultaneously on multiple 
> nodes and splits the token range into subranges with the necessary retry to 
> handle transient failures. Over the period, the automatic repair has become 
> so reliable that it runs as soon as we start a Cassandra cluster, like 
> Compaction, and does not require manual intervention. 
> Due to this fully automated repair scheduler inside Cassandra, there is no 
> dependency on the control plane, significantly reducing our operational 
> overhead.
> *CEP:* 
> [https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-37+%28DRAFT%29+Apache+Cassandra+Unified+Repair+Solution]
> h2. Detailed Design Doc 
> [Automated Repair in 
> Cassandra|https://docs.google.com/document/d/1CJWxjEi-mBABPMZ3VWJ9w5KavWfJETAGxfUpsViPcPo/edit#heading=h.r112r46toau0]
> h2. PR (on 4.1.6) (Last active: Sep 2024)
> Many folks currently are using 4.1.6 in production. Hence, the following PR 
> on 4.1.6 will make it easier for everybody to review the code, test, etc.  If 
> the community decides to merge this CEP, then it will land on the _trunk_ as 
> opposed to {_}4.1{_}.
> [https://github.com/apache/cassandra/pull/3367/]
> h2. PR (on {_}trunk{_}) (Last active: Sep 2024)
> [https://github.com/apache/cassandra/pull/3598]
>  
> h2. Discussion over Slack
> [[1]|https://the-asf.slack.com/archives/CK23JSY2K/p1690225062383619] 
> [[2]|http://cassandra-repair-scheduling-cep37/]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-19918) Automated Repair Inside Cassandra

Reply via email to