[
https://issues.apache.org/jira/browse/CASSANDRA-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Orlowski updated CASSANDRA-19598:
----------------------------------------
Description:
Hello, this is a bug ticket for 4.18.0 of the Java driver.
I am running in an environment where I have 3 Cassandra nodes. We have a use
case to redeploy the cluster from the ground up at midnight every day. This
means that all 3 nodes become unavailable for a short period of time and 3 new
nodes with 3 new ip addresses get spun up and placed behind the contact point
hostname. If you set {{advanced.resolve-contact-points}} to FALSE, the java
driver should re-resolve the hostname for every new connection to that node.
This occurs prior to and for the first redeployment, but the unresolved
hostname is clobbered during the reconnection process and replaced with a
resolved IP address, making additional redeployments fruitless. We provide a
singular hostname as a contact point.
In our case, what is happening is that all 3 nodes become unavailable while our
CICD process is destroying the existing cluster and replacing it with a new
one. During the window of unavailability, the Java driver attempts to reconnect
to each node, two of which internally (internal to the driver) have resolved IP
addresses and one of which retains the unresolved hostname. Here is a
screenshot that captures the internal state of the 3 nodes within `PoolManager`
prior to the finished redeployment of the cluster. Note that there are 2
resolved IP addresses and 1 unresolved hostname.
!image-2024-04-29-20-13-56-161.png|width=985,height=181!
This 2:1 ratio of resolved IP:unresolved hostname is the correct internal state
for a 3 node cluster when `advanced.resolve-contact-points` is set to `FALSE`.
Eventually, the hostname points to one of the 3 new valid nodes, and the java
driver reconnects and discovers the new peers. However, as part of this
reconnection process, the internal Node that held the unresolved hostname is
now overwritten with a Node that has the resolved IP address:
!image-2024-04-29-22-57-26-910.png|width=753,height=107!
Note that we no longer have 2 resolved IP addresses and 1 unresolved hostname;
rather, we have 3 resolved IP addresses, which is an incorrect internal state
when `advanced.resolve-contact-points` is set to `FALSE`. One of the nodes
should have retained the unresolved hostname.
At this stage, the Java driver no longer queries the hostname for new
connections, and further redeployments of ours result in failure because the
hostname is no longer amongst the list of nodes that are queried for
reconnection. This causes us to need to restart the application.
was:
Hello, this is a bug ticket for 4.18.0 of the Java driver.
I am running in an environment where I have 3 Cassandra nodes. We have a use
case to redeploy the cluster from the ground up at midnight every day. This
means that all 3 nodes become unavailable for a short period of time and 3 new
nodes with 3 new ip addresses get spun up and placed behind the contact point
hostname. If you set {{advanced.resolve-contact-points}} to FALSE, the java
driver should re-resolve the hostname for every new connection to that node.
This occurs prior to and for the first redeployment, but the unresolved
hostname is clobbered during the reconnection process and replaced with a
resolved IP address, making additional redeployments fruitless. We provide a
singular hostname as a contact point.
In our case, what is happening is that all 3 nodes become unavailable while our
CICD process is destroying the existing cluster and replacing it with a new
one. During the window of unavailability, the Java driver attempts to reconnect
to each node, two of which internally (internal to the driver) have resolved IP
addresses and one of which retains the unresolved hostname. Here is a
screenshot that captures the internal state of the 3 nodes within `PoolManager`
prior to the finished redeployment of the cluster. Note that there are 2
resolved IP addresses and 1 unresolved hostname.
!image-2024-04-29-20-13-56-161.png|width=985,height=181!
This ratio of resolved IP:unresolved hostname is the correct internal state for
a 3 node cluster when `advanced.resolve-contact-points` is set to `FALSE`.
Eventually, the hostname points to one of the 3 new valid nodes, and the java
driver reconnects and discovers the new peers. However, as part of this
reconnection process, the internal Node that held the unresolved hostname is
now overwritten with a Node that has the resolved IP address:
!image-2024-04-29-22-57-26-910.png|width=753,height=107!
Note that we no longer have 2 resolved IP addresses and 1 unresolved hostname;
rather, we have 3 resolved IP addresses, which is an incorrect internal state
when `advanced.resolve-contact-points` is set to `FALSE`. One of the nodes
should have retained the unresolved hostname.
At this stage, the Java driver no longer queries the hostname for new
connections, and further redeployments of ours result in failure because the
hostname is no longer amongst the list of nodes that are queried for
reconnection. This causes us to need to restart the application.
> advanced.resolve-contact-points: unresolved hostname being clobbered during
> reconnection
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-19598
> URL: https://issues.apache.org/jira/browse/CASSANDRA-19598
> Project: Cassandra
> Issue Type: Bug
> Components: Client/java-driver
> Reporter: Andrew Orlowski
> Priority: Normal
> Attachments: image-2024-04-29-20-13-56-161.png,
> image-2024-04-29-20-40-53-382.png, image-2024-04-29-22-57-26-910.png
>
>
> Hello, this is a bug ticket for 4.18.0 of the Java driver.
>
> I am running in an environment where I have 3 Cassandra nodes. We have a use
> case to redeploy the cluster from the ground up at midnight every day. This
> means that all 3 nodes become unavailable for a short period of time and 3
> new nodes with 3 new ip addresses get spun up and placed behind the contact
> point hostname. If you set {{advanced.resolve-contact-points}} to FALSE, the
> java driver should re-resolve the hostname for every new connection to that
> node. This occurs prior to and for the first redeployment, but the unresolved
> hostname is clobbered during the reconnection process and replaced with a
> resolved IP address, making additional redeployments fruitless. We provide a
> singular hostname as a contact point.
>
> In our case, what is happening is that all 3 nodes become unavailable while
> our CICD process is destroying the existing cluster and replacing it with a
> new one. During the window of unavailability, the Java driver attempts to
> reconnect to each node, two of which internally (internal to the driver) have
> resolved IP addresses and one of which retains the unresolved hostname. Here
> is a screenshot that captures the internal state of the 3 nodes within
> `PoolManager` prior to the finished redeployment of the cluster. Note that
> there are 2 resolved IP addresses and 1 unresolved hostname.
> !image-2024-04-29-20-13-56-161.png|width=985,height=181!
> This 2:1 ratio of resolved IP:unresolved hostname is the correct internal
> state for a 3 node cluster when `advanced.resolve-contact-points` is set to
> `FALSE`.
> Eventually, the hostname points to one of the 3 new valid nodes, and the java
> driver reconnects and discovers the new peers. However, as part of this
> reconnection process, the internal Node that held the unresolved hostname is
> now overwritten with a Node that has the resolved IP address:
> !image-2024-04-29-22-57-26-910.png|width=753,height=107!
> Note that we no longer have 2 resolved IP addresses and 1 unresolved
> hostname; rather, we have 3 resolved IP addresses, which is an incorrect
> internal state when `advanced.resolve-contact-points` is set to `FALSE`. One
> of the nodes should have retained the unresolved hostname.
> At this stage, the Java driver no longer queries the hostname for new
> connections, and further redeployments of ours result in failure because the
> hostname is no longer amongst the list of nodes that are queried for
> reconnection. This causes us to need to restart the application.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]