[ 
https://issues.apache.org/jira/browse/CASSANDRA-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17904762#comment-17904762
 ] 

Sam Tunnicliffe commented on CASSANDRA-19488:
---------------------------------------------

Rebased, squashed and re-ran CI - only 1 failure which seems very likely a CI 
infra issue. This is a sizeable patch and although functionally it should be 
completely transparent without any behavioural changes, it's obviously quite 
challenging to verify the behaviour of the various cloud platform 
specialisations. I'll send another mail on the {{[DISCUSS]}} thread before 
merging but barring any interventions I don't think there's anything else to do 
here.

> Ensure snitches always defer to ClusterMetadata
> -----------------------------------------------
>
>                 Key: CASSANDRA-19488
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Membership, Messaging/Internode, Transactional 
> Cluster Metadata
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: ci_summary-1.html, ci_summary-2.html, ci_summary.html
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Internally, C* always uses {{ClusterMetadata}} as the source of topology 
> information when calculating data placements, replica plans etc and as such 
> the role of the snitch has been somewhat reduced. 
> Sorting and comparison functions as provided by specialisations like 
> {{DynamicEndpointSnitch}} are still used, but the snitch should only be 
> responsible for providing the DC and rack for a new node when it first joins 
> a cluster.
> Aside from initial startup and registration, snitch implementations should 
> always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a 
> risk that the snitch config drifts out of sync with TCM and output from tools 
> like {{nodetool ring}} and {{gossipinfo}} becomes incorrect.
> A complication is that topology is used when opening connections to peers as 
> certain internode connection settings are variable at the DC level, so at the 
> time of connecting we want to check the location of the remote peer. Usually, 
> this is available from {{{}ClusterMetadata{}}}, but in the case of a brand 
> new node joining the cluster nothing is known a priori. The current 
> implementation assumes that the snitch will know the location of the new node 
> ahead of time, but in practice this is often not the case (though with 
> variants of {{PropertyFileSnitch}} it _should_ be), and the remote node is 
> temporarily assigned a default DC. This is problematic as it can cause the 
> internode connection settings which depend on DC to be incorrectly set. 
> Internode connections are long lived and any established while the DC is 
> unknown (potentially with incorrect config) will persist indefinitely. This 
> particular issue is not directly related to TCM and is present in earlier 
> versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to