[
https://issues.apache.org/jira/browse/CASSANDRA-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15983831#comment-15983831
]
Simon Zhou edited comment on CASSANDRA-13387 at 4/25/17 11:33 PM:
------------------------------------------------------------------
Patch for 3.0.x is [here |
https://github.com/szhou1234/cassandra/commit/7d7f55d71623ac9cc4912833b5f4b2562d6263fc].
Exception metrics are emitted on keyspace level (RepairRunnable). We could
emit them on a finer granularity but that means more exceptions, especially for
primary range repair. For monitoring purpose, I think keyspace level metrics
are enough but let me know if you have different opinion. Once initial review
passes, I'll work on a patch for trunk.
was (Author: szhou):
Patch for 3.0.x is [ here |
https://github.com/szhou1234/cassandra/commit/7d7f55d71623ac9cc4912833b5f4b2562d6263fc].
Exception metrics are emitted on keyspace level (RepairRunnable). We could
emit them on a finer granularity but that means more exceptions, especially for
primary range repair. For monitoring purpose, I think keyspace level metrics
are enough but let me know if you have different opinion. Once initial review
passes, I'll work on a patch for trunk.
> Metrics for repair
> ------------------
>
> Key: CASSANDRA-13387
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13387
> Project: Cassandra
> Issue Type: Improvement
> Reporter: Simon Zhou
> Assignee: Simon Zhou
> Priority: Minor
>
> We're missing metrics for repair, especially for errors. From what I observed
> now, the exception will be caught by UncaughtExceptionHandler set in
> CassandraDaemon and is categorized as StorageMetrics.exceptions. This is one
> example:
> {code}
> ERROR [AntiEntropyStage:1] 2017-03-27 18:17:08,385 CassandraDaemon.java:207 -
> Exception in thread Thread[AntiEntropyStage:1,5,main]
> java.lang.RuntimeException: Parent repair session with id =
> 8c85d260-1319-11e7-82a2-25090a89015f has failed.
> at
> org.apache.cassandra.service.ActiveRepairService.getParentRepairSession(ActiveRepairService.java:377)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.service.ActiveRepairService.removeParentRepairSession(ActiveRepairService.java:392)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:172)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
> ~[apache-cassandra-3.0.10.jar:3.0.10]
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[na:1.8.0_121]
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[na:1.8.0_121]
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> ~[na:1.8.0_121]
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> [na:1.8.0_121]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)