[
https://issues.apache.org/jira/browse/CASSANDRA-15878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147114#comment-17147114
]
Joey Lynch commented on CASSANDRA-15878:
----------------------------------------
Hi [~adejanovski] thanks for the mention! I will try to page back in context on
this change and review this weekend, feel free to put me as a reviewer
(although I see mck already got to it :-) ).
> Ec2Snitch fails on upgrade in legacy mode
> -----------------------------------------
>
> Key: CASSANDRA-15878
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15878
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Distributed Metadata
> Reporter: Alexander Dejanovski
> Assignee: Alexander Dejanovski
> Priority: Normal
> Fix For: 4.0-beta
>
>
> CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the
> Ec2Snitch to match AWS conventions.
> The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and
> keep the same naming as before (while the "standard" mode uses the new naming
> convention).
> When performing an upgrade in the us-west-2 region, the second node failed to
> start with the following exception:
>
> {code:java}
> ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled
> snitch appears to be using the legacy naming scheme for regions, but existing
> nodes in cluster are using the opposite: region(s) = [us-west-2],
> availability zone(s) = [2a]. Please check the ec2_naming_scheme property in
> the cassandra-rackdc.properties configuration file for more details.
> ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception
> encountered during startup
> java.lang.IllegalStateException: null
> at
> org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573)
> at
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530)
> at
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:659)
> at
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:610)
> at
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373)
> at
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650)
> at
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767)
> {code}
>
> The exception leads back to [this piece of
> code|https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L183-L185].
> After adding some logging, it turned out the DC name of the first upgraded
> node was considered invalid as a legacy one:
> {code:java}
> INFO [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC
> us-west-2
> INFO [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 -
> dcUsesLegacyFormat=false / usingLegacyNaming=true
> ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name
> us-west-2
> {code}
>
> The problem is that the regex that's used to identify legacy dc names will
> match both old and new names :
> {code:java}
> boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*");
> {code}
> Knowing that some dc names didn't change between the two modes (us-west-2 for
> example), I don't see how we can use the dc names to detect if the legacy
> mode is being used by other nodes in the cluster.
>
> The rack names on the other hand are totally different in the legacy and
> standard modes and can be used to detect mismatching settings.
>
> My go to fix would be to drop the check on datacenters by removing the
> following lines:
> [https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L172-L186]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]