[
https://issues.apache.org/jira/browse/CASSANDRA-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Orlowski updated CASSANDRA-19598:
----------------------------------------
Attachment: image-2024-04-29-22-57-26-910.png
> advanced.resolve-contact-points: unresolved hostname being clobbered during
> reconnection
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19598
> Project: Cassandra
> Issue Type: Bug
> Components: Client/java-driver
> Reporter: Andrew Orlowski
> Priority: Normal
> Attachments: image-2024-04-29-20-13-56-161.png,
> image-2024-04-29-20-40-53-382.png, image-2024-04-29-22-57-26-910.png
>
>
> Hello, this is a bug ticket for 4.18.0 of the Java driver.
>
> I am running in an environment where I have 3 Cassandra nodes. We have a use
> case to redeploy the cluster from the ground up at midnight every day. This
> means that all 3 nodes become unavailable for a short period of time and 3
> new nodes with 3 new ip addresses get spun up and placed behind the contact
> point hostname. If you set {{advanced.resolve-contact-points}} to FALSE, the
> java driver should re-resolve the hostname for every new connection to that
> node. This occurs prior to and for the first redeployment, but the unresolved
> hostname is clobbered during the reconnection process and replaced with a
> resolved IP address, making additional redeployments fruitless. We provide a
> singular hostname as a contact point.
>
> In our case, what is happening is that all 3 nodes become unavailable while
> our CICD process is destroying the existing cluster and replacing it with a
> new one. During the window of unavailability, the Java driver attempts to
> reconnect to each node, two of which internally (internal to the driver) have
> resolved IP addresses and one of which retains the unresolved hostname. Here
> is a screenshot that captures the internal state of the 3 nodes within
> `PoolManager` prior to the finished redeployment of the cluster. Note that
> there are 2 resolved IP addresses and 1 unresolved hostname.
> !image-2024-04-29-20-13-56-161.png|width=985,height=181!
> This ratio of resolved IP:unresolved hostname is the correct internal state
> for a 3 node cluster when `advanced.resolve-contact-points` is set to `FALSE`.
> Eventually, the hostname points to one of the 3 new valid nodes, and the java
> driver reconnects and discovers the new peers. However, as part of this
> reconnection process, the internal Node that held the unresolved hostname is
> now overwritten with a Node that has the resolved IP address:
> !image-2024-04-29-20-40-53-382.png|width=1080,height=102!
> Note that we no longer have 2 resolved IP addresses and 1 unresolved
> hostname; rather, we have 3 resolved IP addresses, which is an incorrect
> internal state when `advanced.resolve-contact-points` is set to `FALSE`. One
> of the nodes should have retained the unresolved hostname.
> At this stage, the Java driver no longer queries the hostname for new
> connections, and further redeployments of ours result in failure because the
> hostname is no longer amongst the list of nodes that are queried for
> reconnection. This causes us to need to restart the application.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]