[jira] [Commented] (CASSANDRA-6455) Improve concurrency of repair process

Marcus Eriksson (JIRA) Mon, 23 Jun 2014 08:15:58 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040831#comment-14040831
 ]


Marcus Eriksson commented on CASSANDRA-6455:
--------------------------------------------

I really like the refactoring, makes repair flow a lot easier to follow

Comments;
* Seems the rebase lost CASSANDRA-3569 - we need to unregister from the FD once 
all validation messages have arrived.
* We should probably cap how big X we can have in -j X - really easy to OOM the 
nodes involved if you put a big X in.
* Should we make the taskExecutor in RepairSession static? Now we create 
num_tokens * RF instances of the taskExecutor and since it is a cached 
threadpool, we might end up with many cached threads (they are killed after 60s 
inactivity, but still). I guess making it static and never shutting it down 
should be fine (I tested this and it reduced the number of created threads in 
that thread pool from ~3k to 12 for a small repair run, tiny patch here: 
https://github.com/krummas/cassandra/commits/yukim/6455).
* Why do we add ourselves as a no-op StreamEventHandler in 
LocalSyncTask/StreamingRepairTask when creating the StreamPlan?

> Improve concurrency of repair process
> -------------------------------------
>
>                 Key: CASSANDRA-6455
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6455
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Yuki Morishita
>            Assignee: Yuki Morishita
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: 6455-3.0.txt, 6455.txt
>
>
> Currently, most of the repair tasks (taking snapshots, send/receiving merkle 
> tree, compute MT difference, etc) are done on single threaded 
> AntiEntropyStage.
> This causes a problem like CASSANDRA-6415 and likely to cause unnecessary 
> wait.
> Also, repair is done one CF at the time. I think we can parallelize 
> this(concurrency is configurable by a user based on # of CF and load of the 
> nodes) for faster processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-6455) Improve concurrency of repair process

Reply via email to