[jira] [Updated] (CASSANDRA-19598) advanced.resolve-contact-points: unresolved hostname being clobbered during reconnection

Andrew Orlowski (Jira) Mon, 29 Apr 2024 21:03:04 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-19598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Andrew Orlowski updated CASSANDRA-19598:
----------------------------------------
    Description: 
Hello, this is a bug ticket for 4.18.0 of the Java driver.

 

I am running in an environment where I have 3 Cassandra nodes. We have a use 
case to redeploy the cluster from the ground up at midnight every day. This 
means that all 3 nodes become unavailable for a short period of time and 3 new 
nodes with 3 new ip addresses get spun up and placed behind the contact point 
hostname. If you set {{advanced.resolve-contact-points}} to FALSE, the java 
driver should re-resolve the hostname for every new connection to that node. 
This occurs prior to and for the first redeployment, but the unresolved 
hostname is clobbered during the reconnection process and replaced with a 
resolved IP address, making additional redeployments fruitless. We provide a 
singular hostname as a contact point.

 

In our case, what is happening is that all 3 nodes become unavailable while our 
CICD process is destroying the existing cluster and replacing it with a new 
one. During the window of unavailability, the Java driver attempts to reconnect 
to each node, two of which internally (internal to the driver) have resolved IP 
addresses and one of which retains the unresolved hostname. Here is a 
screenshot that captures the internal state of the 3 nodes within `PoolManager` 
prior to the finished redeployment of the cluster. Note that there are 2 
resolved IP addresses and 1 unresolved hostname.

!image-2024-04-29-20-13-56-161.png|width=985,height=181!

This 2:1 ratio of resolved IP:unresolved hostname is the correct internal state 
for a 3 node cluster when `advanced.resolve-contact-points` is set to `FALSE`.

Eventually, the hostname points to one of the 3 new valid nodes, and the java 
driver reconnects and discovers the new peers. However, as part of this 
reconnection process, the internal Node that held the unresolved hostname is 
now overwritten with a Node that has the resolved IP address:
!image-2024-04-29-22-57-26-910.png|width=753,height=107!
Note that we no longer have 2 resolved IP addresses and 1 unresolved hostname; 
rather, we have 3 resolved IP addresses, which is an incorrect internal state 
when `advanced.resolve-contact-points` is set to `FALSE`. One of the nodes 
should have retained the unresolved hostname.

At this stage, the Java driver no longer queries the hostname for new 
connections, and further redeployments of ours result in failure because the 
hostname is no longer amongst the list of nodes that are queried for 
reconnection. This causes us to need to restart the application. 

  was:
Hello, this is a bug ticket for 4.18.0 of the Java driver.

 

I am running in an environment where I have 3 Cassandra nodes. We have a use 
case to redeploy the cluster from the ground up at midnight every day. This 
means that all 3 nodes become unavailable for a short period of time and 3 new 
nodes with 3 new ip addresses get spun up and placed behind the contact point 
hostname. If you set {{advanced.resolve-contact-points}} to FALSE, the java 
driver should re-resolve the hostname for every new connection to that node. 
This occurs prior to and for the first redeployment, but the unresolved 
hostname is clobbered during the reconnection process and replaced with a 
resolved IP address, making additional redeployments fruitless. We provide a 
singular hostname as a contact point.

 

In our case, what is happening is that all 3 nodes become unavailable while our 
CICD process is destroying the existing cluster and replacing it with a new 
one. During the window of unavailability, the Java driver attempts to reconnect 
to each node, two of which internally (internal to the driver) have resolved IP 
addresses and one of which retains the unresolved hostname. Here is a 
screenshot that captures the internal state of the 3 nodes within `PoolManager` 
prior to the finished redeployment of the cluster. Note that there are 2 
resolved IP addresses and 1 unresolved hostname.

!image-2024-04-29-20-13-56-161.png|width=985,height=181!

This ratio of resolved IP:unresolved hostname is the correct internal state for 
a 3 node cluster when `advanced.resolve-contact-points` is set to `FALSE`.

Eventually, the hostname points to one of the 3 new valid nodes, and the java 
driver reconnects and discovers the new peers. However, as part of this 
reconnection process, the internal Node that held the unresolved hostname is 
now overwritten with a Node that has the resolved IP address:
!image-2024-04-29-22-57-26-910.png|width=753,height=107!
Note that we no longer have 2 resolved IP addresses and 1 unresolved hostname; 
rather, we have 3 resolved IP addresses, which is an incorrect internal state 
when `advanced.resolve-contact-points` is set to `FALSE`. One of the nodes 
should have retained the unresolved hostname.

At this stage, the Java driver no longer queries the hostname for new 
connections, and further redeployments of ours result in failure because the 
hostname is no longer amongst the list of nodes that are queried for 
reconnection. This causes us to need to restart the application. 


> advanced.resolve-contact-points: unresolved hostname being clobbered during 
> reconnection
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-19598
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19598
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Client/java-driver
>            Reporter: Andrew Orlowski
>            Priority: Normal
>         Attachments: image-2024-04-29-20-13-56-161.png, 
> image-2024-04-29-20-40-53-382.png, image-2024-04-29-22-57-26-910.png
>
>
> Hello, this is a bug ticket for 4.18.0 of the Java driver.
>  
> I am running in an environment where I have 3 Cassandra nodes. We have a use 
> case to redeploy the cluster from the ground up at midnight every day. This 
> means that all 3 nodes become unavailable for a short period of time and 3 
> new nodes with 3 new ip addresses get spun up and placed behind the contact 
> point hostname. If you set {{advanced.resolve-contact-points}} to FALSE, the 
> java driver should re-resolve the hostname for every new connection to that 
> node. This occurs prior to and for the first redeployment, but the unresolved 
> hostname is clobbered during the reconnection process and replaced with a 
> resolved IP address, making additional redeployments fruitless. We provide a 
> singular hostname as a contact point.
>  
> In our case, what is happening is that all 3 nodes become unavailable while 
> our CICD process is destroying the existing cluster and replacing it with a 
> new one. During the window of unavailability, the Java driver attempts to 
> reconnect to each node, two of which internally (internal to the driver) have 
> resolved IP addresses and one of which retains the unresolved hostname. Here 
> is a screenshot that captures the internal state of the 3 nodes within 
> `PoolManager` prior to the finished redeployment of the cluster. Note that 
> there are 2 resolved IP addresses and 1 unresolved hostname.
> !image-2024-04-29-20-13-56-161.png|width=985,height=181!
> This 2:1 ratio of resolved IP:unresolved hostname is the correct internal 
> state for a 3 node cluster when `advanced.resolve-contact-points` is set to 
> `FALSE`.
> Eventually, the hostname points to one of the 3 new valid nodes, and the java 
> driver reconnects and discovers the new peers. However, as part of this 
> reconnection process, the internal Node that held the unresolved hostname is 
> now overwritten with a Node that has the resolved IP address:
> !image-2024-04-29-22-57-26-910.png|width=753,height=107!
> Note that we no longer have 2 resolved IP addresses and 1 unresolved 
> hostname; rather, we have 3 resolved IP addresses, which is an incorrect 
> internal state when `advanced.resolve-contact-points` is set to `FALSE`. One 
> of the nodes should have retained the unresolved hostname.
> At this stage, the Java driver no longer queries the hostname for new 
> connections, and further redeployments of ours result in failure because the 
> hostname is no longer amongst the list of nodes that are queried for 
> reconnection. This causes us to need to restart the application. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (CASSANDRA-19598) advanced.resolve-contact-points: unresolved hostname being clobbered during reconnection

Reply via email to