[ 
https://issues.apache.org/jira/browse/CASSANDRA-17955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17625113#comment-17625113
 ] 

Stefan Miklosovic commented on CASSANDRA-17955:
-----------------------------------------------

As mentioned, I tried to run multiplexer on all repair tests, I can use only 20 
runners and CI job timeouts after 1 hour so I tried to measure the maximum 
amount of repeats over all repair tests. I think I run all repair unit tests 
around 120 times and all dtests 20 times it went all fine. This is only the 
build of trunk (1).

This patch is not introducing any new test nor it modifies any but I still 
tried to run all repair tests in a loop to see if it is stable, which it seems 
it is. Due to limited resources and time constraints I consider this kind of 
testing enough (on top of regular and mandatory 6 jobs above, 2 per branch (8 
and 11 pre-commit)

https://app.circleci.com/pipelines/github/instaclustr/cassandra/1498/workflows/e0f4e61a-cf3b-4ef7-a0c3-d02a25778bb8

> Race condition on repair snapshots
> ----------------------------------
>
>                 Key: CASSANDRA-17955
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17955
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Consistency/Repair, Local/Snapshots
>            Reporter: Cameron Zemek
>            Assignee: Stefan Miklosovic
>            Priority: Normal
>              Labels: 4.0
>             Fix For: 4.0.x, 4.1-rc, 4.x
>
>         Attachments: signature.asc
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> If an endpoint is convicted and that endpoint is a coordinator then 
> ActiveRepairService::removeParentRepairSession is called.
> The issue is that this occurs on clearSnapshotExecutor and can happen while 
> RepairMessageVerbHandler is in process of taking a snapshot. So then you get 
> a race condition and clearSnapshot will throw a 
> java.nio.file.DirectoryNotEmptyException
>  
> {code:java}
> public static void deleteRecursiveWithThrottle(File dir, RateLimiter 
> rateLimiter)
> {
>     if (dir.isDirectory())
>     {
>         String[] children = dir.list();
>         for (String child : children)
>             deleteRecursiveWithThrottle(new File(dir, child), rateLimiter);
>     }
>     // The directory is now empty so now it can be smoked
>     deleteWithConfirmWithThrottle(dir, rateLimiter);
> } {code}
> Due to the directory not being empty when it goes to remove the directory at 
> the end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to