[jira] [Commented] (CASSANDRA-10111) reconnecting snitch can bypass cluster name check

Joel Knighton (JIRA) Fri, 13 Nov 2015 08:59:40 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15004289#comment-15004289
 ]


Joel Knighton commented on CASSANDRA-10111:
-------------------------------------------

This can occur because we only check for cluster name mismatches in the 
{{GossipDigestSynVerbHandler}}.  In the original design of Cassandra, this was 
sufficient, since we always replied to the {{listen_address}}.

Since we now reply to the {{broadcast_address}}, the 
{{GossipDigestAckVerbHandler}} and the {{GossipDigestAck2VerbHandler}} also 
need to check {{clusterId}} for mismatches. {{GossipDigestAck}} and 
{{GossipDigestAck2}} don't contain {{clusterId}} currently, so we need to bump 
the {{MessagingService}} version to accommodate the addition of this field.

The reason this metadata contamination is unidirectional is as follows:
1. New node sends {{GossipDigestSyn}} asking for all info.
2. Node from cluster A replies to cluster B node with shared broadcast address, 
adding info for all nodes from cluster A and asking for no info.
3. Cluster B node doesn't share cluster B data since it hasn't been requested.

All subsequent direct gossiping between the two clusters is blocked by the 
{{GossipDigestSynVerbHandler}}.

I have a working fix for this; we need to decide when a {{MessagingService}} 
bump will occur.

Thanks for the report!

> reconnecting snitch can bypass cluster name check
> -------------------------------------------------
>
>                 Key: CASSANDRA-10111
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10111
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>         Environment: 2.0.x
>            Reporter: Chris Burroughs
>            Assignee: Joel Knighton
>              Labels: gossip
>             Fix For: 2.1.x
>
>
> Setup:
>  * Two clusters: A & B
>  * Both are two DC cluster
>  * Both use GossipingPropertyFileSnitch with different 
> listen_address/broadcast_address
> A new node was added to cluster A with a broadcast_address of an existing 
> node in cluster B (due to an out of data DNS entry).  Cluster B  added all of 
> the nodes from cluster A, somehow bypassing the cluster name mismatch check 
> for this nodes.  The first reference to cluster A nodes in cluster B logs is 
> when then were added:
> {noformat}
>  INFO [GossipStage:1] 2015-08-17 15:08:33,858 Gossiper.java (line 983) Node 
> /8.37.70.168 is now part of the cluster
> {noformat}
> Cluster B nodes then tried to gossip to cluster A nodes, but cluster A kept 
> them out with 'ClusterName mismatch'.  Cluster B however tried to send to 
> send reads/writes to cluster A and general mayhem ensued.
> Obviously this is a Bad (TM) config that Should Not Be Done.  However, since 
> the consequence of crazy merged clusters are really bad (the reason there is 
> the name mismatch check in the first place) I think the hole is reasonable to 
> plug.  I'm not sure exactly what the code path is that skips the check in 
> GossipDigestSynVerbHandler.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10111) reconnecting snitch can bypass cluster name check

Reply via email to