[ 
https://issues.apache.org/jira/browse/CASSJAVA-106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18049861#comment-18049861
 ] 

Lukasz Antoniak commented on CASSJAVA-106:
------------------------------------------

Can you check if {{resolve-contact-points = false}} fixes the issue for you? 
See documentation 
[here|https://github.com/apache/cassandra-java-driver/blob/4.x/core/src/main/resources/reference.conf#L1088].

> Gauge counters for open-connections not updated after Cassandra pod 
> recreation in geographical redundant setup
> --------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSJAVA-106
>                 URL: https://issues.apache.org/jira/browse/CASSJAVA-106
>             Project: Apache Cassandra Java driver
>          Issue Type: Bug
>            Reporter: Ioannis Stoltidis
>            Priority: Normal
>
> We are running a containerized version of Cassandra in a geographical 
> redundant setup with 2 datacenters. Each datacenter contains three Cassandra 
> pods, which are managed as part of a Cassandra StatefulSet. Every pod has an 
> associated Kubernetes service with a load balancer IP address. This IP 
> remains constant and serves as the hostname for internode communication among 
> all Cassandra pods. Additionally, each datacenter includes a pod running our 
> application, which uses the Cassandra driver to communicate with the pool of 
> Cassandra pods. We utilize the DataStax Java driver configured as follows:
>  * Two contact points are specified, connecting to two hosts (the first 2 
> pods, named cassandra-datacenter1_rack1-0 and cassandra-datacenter1_rack1-1).
>  * After all the endpoints are discovered, one connection per server in the 
> local DC is established, along with one control connection.
> The mapping between host domains and IP addresses is as follows:
> ||domain||IP||
> |cassandra-datacenter1_rack1-0|214.22.161.195|
> |cassandra-datacenter1_rack1-1|214.22.161.196|
> |cassandra-datacenter1_rack1-2|214.22.161.197|
> While monitoring Cassandra connections using gauge counters exposed via the 
> Dropwizard exporter, we observed that some counters show domain names while 
> others display IP addresses, and at least one counter appears duplicated. 
> The following 4 gauge counters are being observed:
> {noformat}
> s0.nodes.214_22_161_196:9042.pool.open-connections → initial value: 1  
> s0.nodes.214_22_161_197:9042.pool.open-connections → initial value: 2  
> s0.nodes.cassandra-datacenter1-rack1-0_cassandra-datacenter1-rack1:9042.pool.open-connections
>  → initial value: 1  
> s0.nodes.cassandra-datacenter1-rack1-1_cassandra-datacenter1-rack1:9042.pool.open-connections
>  → initial value: 0{noformat}
> After testing the following recovery procedure on 2 of the 3 pods in the 
> local datacenter:
>  * Halt Cassandra container using: echo STOPPED > 
> /var/lib/cassandra/.cassandra.init && pkill java
>  * Remove Persistent Volume Claim (PVC) associated with the two pods
>  * Run nodetool removenode on the cluster to clean up the old instances
>  * Restart the two pods and re-enable Cassandra using: echo RUNNING > 
> /var/lib/cassandra/.cassandra.init
> We observed that the gauge counters are no longer accurately updated. 
> Specifically, they change to:
> {noformat}
> s0.nodes.214_22_161_196:9042.pool.open-connections → 0  
> s0.nodes.214_22_161_197:9042.pool.open-connections → 2  
> s0.nodes.cassandra-datacenter1-rack1-0_cassandra-datacenter1-rack1:9042.pool.open-connections
>  → 0  
> s0.nodes.cassandra-datacenter1-rack1-1_cassandra-datacenter1-rack1:9042.pool.open-connections
>  → 0{noformat}
> No other counters are created. These values remain stuck and do not reflect 
> the actual state of the connection pool, because from server side we can 
> verify that all expected connections are up again (i.e. one connection per 
> server + 1 control). These values are only correctly reset when we manually 
> restart the application pod that utilizes the DataStax Java driver, which in 
> turn recreates the session.
> *Expected behavior:*
> Gauge counters should reflect the actual number of open connections even 
> after the Cassandra pods are deleted and recreated.
> *Observed behavior:*
> After pod recreation and node replacement, the counters stay at incorrect 
> values until the client session is forcibly reset by restarting the 
> application.
> *Environment:*
> Cassandra: containerized
> Java driver: DataStax Java driver (version 4.19.0)
> Monitoring via: simpleclient_dropwizard of io.prometheus
> Setup: Geo-redundant, 2 datacenters, 3 pods per datacenter
> *Impact:*
> This behavior results in stale monitoring data and obscures actual cluster 
> health and connectivity, particularly in automated or production setups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to