[
https://issues.apache.org/jira/browse/CASSANDRA-20910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18061777#comment-18061777
]
Arup Chauhan edited comment on CASSANDRA-20910 at 2/28/26 6:13 AM:
-------------------------------------------------------------------
Hi [~bereng], update:
I have opened a companion PR with regression coverage.
It verifies foreign gossip SYN (cluster-name/partitioner mismatch) is rejected
and does not admit the endpoint into membership state.
If you are okay with it, I would like to help with the production fix by adding
or auditing a final check at the membership admission point so a foreign node
can never transition into membership, even if a specific gossip handler path
misses the check.
Please point me to the preferred place to enforce this, and the branch you want
me to target (4.1 first or trunk).
was (Author: JIRAUSER312424):
Hi [~bereng], update:
I have opened a companion PR with regression coverage.
It verifies foreign gossip SYN (cluster-name/partitioner mismatch) is rejected
and does not admit the endpoint into membership state.
If you are okay with it, I would like to help with the production fix by adding
or auditing a final “backstop” check at the membership admission point so a
foreign node can never transition into membership, even if a specific gossip
handler path misses the check.
Please point me to the preferred place to enforce this, and the branch you want
me to target (4.1 first or trunk).
> Instances from a 2nd ring join another ring when running on the same nodes
> ---------------------------------------------------------------------------
>
> Key: CASSANDRA-20910
> URL: https://issues.apache.org/jira/browse/CASSANDRA-20910
> Project: Apache Cassandra
> Issue Type: Bug
> Components: Cluster/Membership
> Reporter: Chris Miller
> Assignee: Arup Chauhan
> Priority: Urgent
> Fix For: 4.1.x, 5.0.x, 6.x
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> Hi,
> We experienced an issue today whereby instances from a 2nd ring join another
> ring when running on the same nodes following a rolling restart which took
> place following an OS patch and node reboot (both on Cassandra 4.1.2).
> The cluster names and storage ports are different and this type of activity
> normally runs without issue.
> Any ideas as to what could have happened? Could this be a bug?
> The seeds use the same IP addresses but no storage port is configured in the
> seeds parameter, should we add the storage port to prevent this from
> happening again? Any thoughts?
> Messages like the following could be seen on ring 1.
> INFO [GossipStage:1] 2025-09-18 04:11:49,040 Gossiper.java:1434 - Node
> /XX.XX.XX.190:7002 is now part of the cluster
> INFO [GossipStage:1] 2025-09-18 04:11:49,043 TokenMetadata.java:539 -
> Updating topology for /XX.XX.XX.190:7002
> INFO [Messaging-EventLoop-3-8] 2025-09-18 04:11:49,044
> OutboundConnection.java:1153 -
> /XX.XX.XX.61:7000(/XX.XX.XX.61:41920)->/XX.XX.XX.190:7002-URGENT_MESSAGES-7af53583
> successfully connected, version = 12, framing = CRC, encryption = unencrypted
> INFO [GossipStage:1] 2025-09-18 04:11:49,044 TokenMetadata.java:539 -
> Updating topology for /XX.XX.XX.190:7002
> INFO [GossipStage:1] 2025-09-18 04:11:49,044 Gossiper.java:1434 - Node
> /XX.XX.XX.214:7002 is now part of the cluster
> INFO [Messaging-EventLoop-3-3] 2025-09-18 04:11:49,046
> OutboundConnection.java:1153 -
> /XX.XX.XX.61:7000(/XX.XX.XX.61:62628)->/XX.XX.XX.214:7002-URGENT_MESSAGES-0515b24a
> successfully connected, version = 12, framing = CRC, encryption = unencrypted
> INFO [GossipStage:1] 2025-09-18 04:11:49,046 TokenMetadata.java:539 -
> Updating topology for /XX.XX.XX.214:7002
> INFO [GossipStage:1] 2025-09-18 04:11:49,046 TokenMetadata.java:539 -
> Updating topology for /XX.XX.XX.214:7002
> INFO [GossipStage:1] 2025-09-18 04:11:49,047 Gossiper.java:1434 - Node
> /XX.XX.XX.247:7002 is now part of the cluster
> INFO [Messaging-EventLoop-3-4] 2025-09-18 04:11:49,048
> InboundConnectionInitiator.java:529 -
> /XX.XX.XX.190:7002(/XX.XX.XX.190:60180)->/XX.XX.XX.61:7000-URGENT_MESSAGES-edfb2d8f
> messaging connection established, version = 12, framing = LZ4, encryption =
> unencrypted
> Messages like the following in ring 2:
> WARN [GossipStage:1] 2025-09-18 04:11:49,304
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.247:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:49,819
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.108:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:51,598
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.190:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:52,361
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.111:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:53,489
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.84:7000 ring1!=ring2
> WARN [GossipStage:1] 2025-09-18 04:11:58,322
> GossipDigestSynVerbHandler.java:58 - ClusterName mismatch from
> /XX.XX.XX.247:7000 ring1!=ring2
> Instances from ring2 were listed in nodetool describecluster as unreachable
> under schema versions.
> They were also listed as DN under nodetool status.
> The nodetool removenode command was used to remove the instances successfully.
> Regards,
> Chris.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]