[
https://issues.apache.org/jira/browse/CASSANDRA-12146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yuki Morishita updated CASSANDRA-12146:
---------------------------------------
Resolution: Fixed
Fix Version/s: (was: 3.0.x)
(was: 2.2.x)
(was: 3.x)
3.9
3.0.9
2.2.8
Status: Resolved (was: Patch Available)
Thanks for the patch. Nice idea.
+1 and committed as {f28409bb9730c0318c3243f9d0febbb05ec0c2dc}.
> Use dedicated executor for sending JMX notifications
> ----------------------------------------------------
>
> Key: CASSANDRA-12146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12146
> Project: Cassandra
> Issue Type: Bug
> Components: Observability
> Reporter: Stefan Podkowinski
> Assignee: Stefan Podkowinski
> Fix For: 2.2.8, 3.0.9, 3.9
>
> Attachments: 12146-2.2.patch
>
>
> I'm currently looking into an issue with our repair process where we can
> notice a significant delay at the end of the repair task and before nodetool
> is actually terminating. At the same time JMX NOTIF_LOST errors are reported
> in nodetool during most repair runs.
> Currently {{StorageService.repairAsync(keyspace, options)}} is called through
> JMX, which will start a new thread executing RepairRunnable using the
> provided options. StorageService itself implements
> NotificationBroadcasterSupport and will send JMX progress notifications
> emitted from RepairRunnable (or during bootstrap). If you take a closer look
> at {{RepairRunnable}}, {{JMXProgressSupport}} and
> {{StorageService/NotificationBroadcasterSupport.sendNotification}} you'll
> notice that this all happens within the calling thread, i.e. RepairRunnable.
> Given the lost notifications and all kind of potential networking related
> issues, I'm not really comfortable having the repair coordinator thread
> running in the JMX stack. Fortunately NotificationBroadcasterSupport accepts
> a custom executor as constructor argument. See attached patched.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)