Alexander Dejanovski created CASSANDRA-15878:
------------------------------------------------
Summary: Ec2Snitch fails on upgrade in legacy mode
Key: CASSANDRA-15878
URL: https://issues.apache.org/jira/browse/CASSANDRA-15878
Project: Cassandra
Issue Type: Bug
Reporter: Alexander Dejanovski
CASSANDRA-7839 changed the way the EC2 DC/Rack naming was handled in the
Ec2Snitch to match AWS conventions.
The "legacy" mode was introduced to allow upgrades from Cassandra 3.0/3.x and
keep the same naming as before (while the "standard" mode uses the new naming
convention).
When performing an upgrade in the us-west-2 region, the second node failed to
start with the following exception:
{code:java}
ERROR [main] 2020-06-16 09:14:42,218 Ec2Snitch.java:210 - This ec2-enabled
snitch appears to be using the legacy naming scheme for regions, but existing
nodes in cluster are using the opposite: region(s) = [us-west-2], availability
zone(s) = [2a]. Please check the ec2_naming_scheme property in the
cassandra-rackdc.properties configuration file for more details.
ERROR [main] 2020-06-16 09:14:42,219 CassandraDaemon.java:789 - Exception
encountered during startup
java.lang.IllegalStateException: null
at
org.apache.cassandra.service.StorageService.validateEndpointSnitch(StorageService.java:573)
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:530)
at
org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:800)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:659)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:610)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:373)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:650)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:767)
{code}
The exception leads back to [this piece of
code|https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L183-L185].
After adding some logging, it turned out the DC name of the first upgraded node
was considered invalid as a legacy one:
{code:java}
INFO [main] 2020-06-16 09:14:42,216 Ec2Snitch.java:183 - Detected DC us-west-2
INFO [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:185 -
dcUsesLegacyFormat=false / usingLegacyNaming=true
ERROR [main] 2020-06-16 09:14:42,217 Ec2Snitch.java:188 - Invalid DC name
us-west-2
{code}
The problem is that the regex that's used to identify legacy dc names will
match both old and new names :
{code:java}
boolean dcUsesLegacyFormat = !dc.matches("[a-z]+-[a-z].+-[\\d].*");
{code}
Knowing that some dc names didn't change between the two modes (us-west-2 for
example), I don't see how we can use the dc names to detect if the legacy mode
is being used by other nodes in the cluster.
The rack names on the other hand are totally different in the legacy and
standard modes and can be used to detect mismatching settings.
My go to fix would be to drop the check on datacenters by removing the
following lines:
[https://github.com/apache/cassandra/blob/cassandra-4.0-alpha4/src/java/org/apache/cassandra/locator/Ec2Snitch.java#L172-L186]
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]