[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs

Marcus Eriksson (Jira) Fri, 11 Feb 2022 05:18:15 -0800


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-17342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Marcus Eriksson updated CASSANDRA-17342:
----------------------------------------
          Fix Version/s: 4.0.3
                             (was: 4.0.x)
          Since Version: 4.0.0
    Source Control Link: 
https://github.com/apache/cassandra/commit/c60ad61b3b6145af100578f2c652819f61729018
             Resolution: Fixed
                 Status: Resolved  (was: Ready to Commit)

Committed to 4.0 and merged up, thanks again for the patch!

trunk tests look bad, but similar to [non-patched 
trunk|https://app.circleci.com/pipelines/github/krummas/cassandra/775/workflows/b0ede5ae-db7c-4a1d-b6ff-22245922bb46]

[circleci 
4.0|https://app.circleci.com/pipelines/github/krummas/cassandra/770/workflows/edfe8c85-0de6-4191-b4be-e7c4cb1a4c1e]
[circleci 
trunk|https://app.circleci.com/pipelines/github/krummas/cassandra/769/workflows/6eea562c-0354-41e2-b253-32da2f929193]

> Performance problem for node restart with incremental range repairs
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-17342
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17342
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair
>            Reporter: Paul Chandler
>            Assignee: Paul Chandler
>            Priority: Normal
>             Fix For: 4.0.3
>
>         Attachments: BulkRepairStateTest.java, 
> IncrementalRepairStartupTest.java, LocalSessions.java, RepairedState.java
>
>
> There is a performance problem when restarting cassandra for clusters doing 
> incremental repairs with range repairs. 
> We have clusters with 16 vnodes per node, and are splitting each vnode into 
> 100 ranges, this causes a node to take over 30 minutes to process the data 
> stored in the system.repairs table before the node can restart. Even when we 
> reduce this to 10 ranges per vnode this still takes 2 minutes to process. The 
> cluster has 22 keyspaces and a rf of 3, this creates around 8100 records in 
> the system.repairs table.
>  
> The problem seems to occur in the 
> org.apache.cassandra.repair.consistent.RepairState class where the add method 
> re processes the complete list, including sorting, every time a new Range is 
> added. This leads is an exponential growth in processing time, this is 
> demonstrated in the attached unit test.
>  
> I have created a change, that collects the data read in from the 
> system.repairs table, in the 
> org.apache.cassandra.repair.consistent.LocalSessions class, before processing 
> it as a group at the end, this reduces the processing time to a couple of 
> seconds even for the 100 range version.
>  
> This is my first attempt at changing the cassandra code, so I am in need of a 
> mentor to help me with the process, and validate what I have done.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-17342) Performance problem for node restart with incremental range repairs

Reply via email to