[ 
https://issues.apache.org/jira/browse/CASSANDRA-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17906292#comment-17906292
 ] 

Stefan Miklosovic edited comment on CASSANDRA-19488 at 12/17/24 7:43 AM:
-------------------------------------------------------------------------

[~samt] [~marcuse]

I am not sure what's up but it fails a lot of tests in trunk after the merge, 
visible in CI here (1). My Circle CI job on top of that change fails too so it 
does not seem like a blip. It is basically this all over again:

{code}
org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:281)\n\tat
 
org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:267)\n\tat
 
org.apache.cassandra.service.CassandraDaemon.applyConfig(CassandraDaemon.java:781)\n\tat
 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:724)\n\tat
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:882)', 
[node3] 'ERROR [main] 2024-12-16 22:49:52,483 CassandraDaemon.java:904 - 
Exception encountered during 
startup\norg.apache.cassandra.exceptions.ConfigurationException: Configuration 
must specify either node_proximity and initial_location_provider or 
endpoint_snitch but not both. \n\tat 
org.apache.cassandra.config.DatabaseDescriptor.applySnitch(DatabaseDescriptor.java:1504)\n\tat
 
org.apache.cassandra.config.DatabaseDescriptor.applyAll(DatabaseDescriptor.java:530)\n\tat
 
{code}

I suspect that you guy run that in your CI against some tweaked 
cassandra-dtests or something was not merged to dtests yet? 
(1) https://ci-cassandra.apache.org/job/Cassandra-trunk/1963/
(2) 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/5136/workflows/05972ac6-1bf0-4720-ac5f-75bc99eff898/jobs/314091/tests


was (Author: smiklosovic):
I am not sure what's up but it fails a lot of tests in trunk after the merge, 
visible in CI here (1). My Circle CI job on top of that change fails too so it 
does not seem like a blip. It is basically this all over again:

{code}
org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:281)\n\tat
 
org.apache.cassandra.config.DatabaseDescriptor.daemonInitialization(DatabaseDescriptor.java:267)\n\tat
 
org.apache.cassandra.service.CassandraDaemon.applyConfig(CassandraDaemon.java:781)\n\tat
 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:724)\n\tat
 org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:882)', 
[node3] 'ERROR [main] 2024-12-16 22:49:52,483 CassandraDaemon.java:904 - 
Exception encountered during 
startup\norg.apache.cassandra.exceptions.ConfigurationException: Configuration 
must specify either node_proximity and initial_location_provider or 
endpoint_snitch but not both. \n\tat 
org.apache.cassandra.config.DatabaseDescriptor.applySnitch(DatabaseDescriptor.java:1504)\n\tat
 
org.apache.cassandra.config.DatabaseDescriptor.applyAll(DatabaseDescriptor.java:530)\n\tat
 
{code}

I suspect that you guy run that in your CI against some tweaked 
cassandra-dtests or something was not merged to dtests yet? 
(1) https://ci-cassandra.apache.org/job/Cassandra-trunk/1963/
(2) 
https://app.circleci.com/pipelines/github/instaclustr/cassandra/5136/workflows/05972ac6-1bf0-4720-ac5f-75bc99eff898/jobs/314091/tests

> Ensure snitches always defer to ClusterMetadata
> -----------------------------------------------
>
>                 Key: CASSANDRA-19488
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
>             Project: Apache Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Membership, Messaging/Internode, Transactional 
> Cluster Metadata
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>             Fix For: 5.x
>
>         Attachments: ci_summary.html
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Internally, C* always uses {{ClusterMetadata}} as the source of topology 
> information when calculating data placements, replica plans etc and as such 
> the role of the snitch has been somewhat reduced. 
> Sorting and comparison functions as provided by specialisations like 
> {{DynamicEndpointSnitch}} are still used, but the snitch should only be 
> responsible for providing the DC and rack for a new node when it first joins 
> a cluster.
> Aside from initial startup and registration, snitch implementations should 
> always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a 
> risk that the snitch config drifts out of sync with TCM and output from tools 
> like {{nodetool ring}} and {{gossipinfo}} becomes incorrect.
> A complication is that topology is used when opening connections to peers as 
> certain internode connection settings are variable at the DC level, so at the 
> time of connecting we want to check the location of the remote peer. Usually, 
> this is available from {{{}ClusterMetadata{}}}, but in the case of a brand 
> new node joining the cluster nothing is known a priori. The current 
> implementation assumes that the snitch will know the location of the new node 
> ahead of time, but in practice this is often not the case (though with 
> variants of {{PropertyFileSnitch}} it _should_ be), and the remote node is 
> temporarily assigned a default DC. This is problematic as it can cause the 
> internode connection settings which depend on DC to be incorrectly set. 
> Internode connections are long lived and any established while the DC is 
> unknown (potentially with incorrect config) will persist indefinitely. This 
> particular issue is not directly related to TCM and is present in earlier 
> versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to