[jira] [Commented] (CASSANDRA-16245) Implement repair quality test scenarios

Alexander Dejanovski (Jira) Thu, 10 Dec 2020 05:51:06 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17247249#comment-17247249
 ]


Alexander Dejanovski commented on CASSANDRA-16245:
--------------------------------------------------

Hi [~zvo], 

Awesome stuff so far!

I've pushed a GitHub Actions workflow which spins up/tears down a 3 node 
cluster in AWS using m5ad.xlarge instances (4 vCPUs, 16G RAM and 150GB of 
direct attached storage).
They provide a 140GB SSD drive which is mounted as {{/var/lib/cassandra}} by 
tlp-cluster.
Let's start with a dataset of 100GB per node for our testing, which should be 
good enough for now.

The test suite needs to be adjusted to target the "real" cluster instead of a 
ccm one, and tlp-cluster provides environment variables with each node's public 
IP in the {{env.sh}} file ({{source env.sh}} sets the variables along with the 
other tlp-cluster aliases).

Could you rename the branch you're working on to {{CASSANDRA-16245}}?

Let me know if you have what you need to move this forward.

> Implement repair quality test scenarios
> ---------------------------------------
>
>                 Key: CASSANDRA-16245
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16245
>             Project: Cassandra
>          Issue Type: Task
>          Components: Test/dtest/java
>            Reporter: Alexander Dejanovski
>            Assignee: Radovan Zvoncek
>            Priority: Normal
>             Fix For: 4.0-rc
>
>
> Implement the following test scenarios in a new test suite for repair 
> integration testing with significant load:
> Generate/restore a workload of ~100GB per node. Medusa should be considered 
> to create the initial backup which could then be restored from an S3 bucket 
> to speed up node population.
>  Data should on purpose require repair and be generated accordingly.
> Perform repairs for a 3 nodes cluster with 4 cores each and 16GB-32GB RAM 
> (m5d.xlarge instances would be the most cost efficient type).
>  Repaired keyspaces will use RF=3 or RF=2 in some cases (the latter is for 
> subranges with different sets of replicas).
> ||Mode||Version||Settings||Checks||
> |Full repair|trunk|Sequential + All token ranges|"No anticompaction 
> (repairedAt==0)
>  Out of sync ranges > 0
>  Subsequent run must show no out of sync range"|
> |Full repair|trunk|Parallel + Primary range|"No anticompaction (repairedAt==0)
>  Out of sync ranges > 0
>  Subsequent run must show no out of sync range"|
> |Full repair|trunk|Force terminate repair shortly after it was 
> triggered|Repair threads must be cleaned up|
> |Subrange repair|trunk|Sequential + single token range|"No anticompaction 
> (repairedAt==0)
>  Out of sync ranges > 0
>  Subsequent run must show no out of sync range"|
> |Subrange repair|trunk|Parallel + 10 token ranges which have the same 
> replicas|"No anticompaction (repairedAt == 0)
>  Out of sync ranges > 0
>  Subsequent run must show no out of sync range
> A single repair session will handle all subranges at once"|
> |Subrange repair|trunk|Parallel + 10 token ranges which have different 
> replicas|"No anticompaction (repairedAt==0)
>  Out of sync ranges > 0
>  Subsequent run must show no out of sync range
> More than one repair session is triggered to process all subranges"|
> |Subrange repair|trunk|"Single token range.
>  Force terminate repair shortly after it was triggered."|Repair threads must 
> be cleaned up|
> |Incremental repair|trunk|"Parallel (mandatory)
>  No compaction during repair"|"Anticompaction status (repairedAt != 0) on all 
> SSTables
>  No pending repair on SSTables after completion (could require to wait a bit 
> as this will happen asynchronously)
>  Out of sync ranges > 0 + Subsequent run must show no out of sync range"|
> |Incremental repair|trunk|"Parallel (mandatory)
>  Major compaction triggered during repair"|"Anticompaction status (repairedAt 
> != 0) on all SSTables
>  No pending repair on SSTables after completion (could require to wait a bit 
> as this will happen asynchronously)
>  Out of sync ranges > 0 + Subsequent run must show no out of sync range"|
> |Incremental repair|trunk|Force terminate repair shortly after it was 
> triggered.|Repair threads must be cleaned up|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Commented] (CASSANDRA-16245) Implement repair quality test scenarios

Reply via email to