[
https://issues.apache.org/jira/browse/CASSANDRA-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004289#comment-15004289
]
Joel Knighton commented on CASSANDRA-10111:
-------------------------------------------
This can occur because we only check for cluster name mismatches in the
{{GossipDigestSynVerbHandler}}. In the original design of Cassandra, this was
sufficient, since we always replied to the {{listen_address}}.
Since we now reply to the {{broadcast_address}}, the
{{GossipDigestAckVerbHandler}} and the {{GossipDigestAck2VerbHandler}} also
need to check {{clusterId}} for mismatches. {{GossipDigestAck}} and
{{GossipDigestAck2}} don't contain {{clusterId}} currently, so we need to bump
the {{MessagingService}} version to accommodate the addition of this field.
The reason this metadata contamination is unidirectional is as follows:
1. New node sends {{GossipDigestSyn}} asking for all info.
2. Node from cluster A replies to cluster B node with shared broadcast address,
adding info for all nodes from cluster A and asking for no info.
3. Cluster B node doesn't share cluster B data since it hasn't been requested.
All subsequent direct gossiping between the two clusters is blocked by the
{{GossipDigestSynVerbHandler}}.
I have a working fix for this; we need to decide when a {{MessagingService}}
bump will occur.
Thanks for the report!
> reconnecting snitch can bypass cluster name check
> -------------------------------------------------
>
> Key: CASSANDRA-10111
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10111
> Project: Cassandra
> Issue Type: Bug
> Components: Distributed Metadata
> Environment: 2.0.x
> Reporter: Chris Burroughs
> Assignee: Joel Knighton
> Labels: gossip
> Fix For: 2.1.x
>
>
> Setup:
> * Two clusters: A & B
> * Both are two DC cluster
> * Both use GossipingPropertyFileSnitch with different
> listen_address/broadcast_address
> A new node was added to cluster A with a broadcast_address of an existing
> node in cluster B (due to an out of data DNS entry). Cluster B added all of
> the nodes from cluster A, somehow bypassing the cluster name mismatch check
> for this nodes. The first reference to cluster A nodes in cluster B logs is
> when then were added:
> {noformat}
> INFO [GossipStage:1] 2015-08-17 15:08:33,858 Gossiper.java (line 983) Node
> /8.37.70.168 is now part of the cluster
> {noformat}
> Cluster B nodes then tried to gossip to cluster A nodes, but cluster A kept
> them out with 'ClusterName mismatch'. Cluster B however tried to send to
> send reads/writes to cluster A and general mayhem ensued.
> Obviously this is a Bad (TM) config that Should Not Be Done. However, since
> the consequence of crazy merged clusters are really bad (the reason there is
> the name mismatch check in the first place) I think the hole is reasonable to
> plug. I'm not sure exactly what the code path is that skips the check in
> GossipDigestSynVerbHandler.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)