[
https://issues.apache.org/jira/browse/CASSANDRA-6455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040831#comment-14040831
]
Marcus Eriksson commented on CASSANDRA-6455:
--------------------------------------------
I really like the refactoring, makes repair flow a lot easier to follow
Comments;
* Seems the rebase lost CASSANDRA-3569 - we need to unregister from the FD once
all validation messages have arrived.
* We should probably cap how big X we can have in -j X - really easy to OOM the
nodes involved if you put a big X in.
* Should we make the taskExecutor in RepairSession static? Now we create
num_tokens * RF instances of the taskExecutor and since it is a cached
threadpool, we might end up with many cached threads (they are killed after 60s
inactivity, but still). I guess making it static and never shutting it down
should be fine (I tested this and it reduced the number of created threads in
that thread pool from ~3k to 12 for a small repair run, tiny patch here:
https://github.com/krummas/cassandra/commits/yukim/6455).
* Why do we add ourselves as a no-op StreamEventHandler in
LocalSyncTask/StreamingRepairTask when creating the StreamPlan?
> Improve concurrency of repair process
> -------------------------------------
>
> Key: CASSANDRA-6455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6455
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Yuki Morishita
> Assignee: Yuki Morishita
> Priority: Minor
> Fix For: 3.0
>
> Attachments: 6455-3.0.txt, 6455.txt
>
>
> Currently, most of the repair tasks (taking snapshots, send/receiving merkle
> tree, compute MT difference, etc) are done on single threaded
> AntiEntropyStage.
> This causes a problem like CASSANDRA-6415 and likely to cause unnecessary
> wait.
> Also, repair is done one CF at the time. I think we can parallelize
> this(concurrency is configurable by a user based on # of CF and load of the
> nodes) for faster processing.
--
This message was sent by Atlassian JIRA
(v6.2#6252)