[jira] [Commented] (CASSSIDECAR-354) cassinstancesdown/up metrics not updating when instances go down

Stefan Miklosovic (Jira) Sat, 18 Oct 2025 15:05:15 -0700


    [ 
https://issues.apache.org/jira/browse/CASSSIDECAR-354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028998#comment-18028998
 ]


Stefan Miklosovic commented on CASSSIDECAR-354:
-----------------------------------------------

[~frankgh] any insights?

> cassinstancesdown/up metrics not updating when instances go down
> ----------------------------------------------------------------
>
>                 Key: CASSSIDECAR-354
>                 URL: https://issues.apache.org/jira/browse/CASSSIDECAR-354
>             Project: Sidecar for Apache Cassandra
>          Issue Type: Bug
>          Components: Observability
>            Reporter: Carl Sandland
>            Priority: Major
>
> When stopping a cassandra 'instance', sidecar is not updating these metrics 
> correctly, as the onFailure() block that does the updates is not being 
> called, due to exceptions being swallowed in the degelate. Exceptions being 
> swallowed doesn't seem to work well with promise chains.
> My expectations where:
> Assume a simple sidecar config with one attached cassandra instance, all 
> started up and running happily: cassinstancesdown = 0, cassinstancesup = 1. 
> Then manually stop cassandra: cassinstancesdown = 1, cassinstancesup = 0
> I was seeing a constant : cassinstancesdown=0, cassinstancesup=1
> Specifically, the code here:
> {code:java}
> private Future<Void> healthCheck(InstanceMetadata instanceMetadata, 
> AtomicInteger instanceDown)
> {
>     return internalPool
>            .runBlocking(() -> instanceMetadata.delegate().healthCheck(), 
> false)
>            .onFailure(cause -> {
>                instanceDown.incrementAndGet();
>                LOGGER.error("Unable to complete health check on instance={}",
>                             instanceMetadata.id(), cause);
>            });
> } {code}
> the metric is updated in the onFailure(), yet the exceptions that would 
> trigger a failure (like not being able to connect) are swallowed by the 
> delegate (CassandraAdapterDelegate) healthCheck() call.
> I experimented by re-throwing the exceptions in the delegate and the metric 
> started tracking correctly. There is quite a lot of state change in the 
> delegate in the exception handlers so didn't feel comfortable 'throwing' a 
> simplistic PR out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSSIDECAR-354) cassinstancesdown/up metrics not updating when instances go down

Reply via email to