[ https://issues.apache.org/jira/browse/CASSANDRA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15113161#comment-15113161 ]
Joel Knighton commented on CASSANDRA-10134: ------------------------------------------- I'd be pretty comfortable reviewing the patch + Tyler's suggestions for inclusion in 2.2+. This is one of the more straightforward parts of gossip. If we're restricting this to newer versions because of the extent of the changes, it might be worth going a little farther with this. The intent is to improve safety, but as implemented, a node with auto_bootstrap: false or acting as a seed can still replace an existing address if it is partitioned from all seeds or otherwise unable to communicate with all seeds during the shadow round. If we expose "in a shadow round" in some form (exposed through gossip or otherwise), we can restrict successfully exiting the shadow round to nodes who have either successfully gossiped with a seed node or learned all seed nodes are in their shadow round. This would also remove the delay on start up, but it would require starting seed nodes in a new cluster with multiple seeds concurrently. This (or some variation on it) is a bit safer, but it may not be worth the effort. > Always require replace_address to replace existing address > ---------------------------------------------------------- > > Key: CASSANDRA-10134 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10134 > Project: Cassandra > Issue Type: Improvement > Components: Distributed Metadata > Reporter: Tyler Hobbs > Assignee: Stefania > Fix For: 2.2.x, 3.0.x, 3.x > > > Normally, when a node is started from a clean state with the same address as > an existing down node, it will fail to start with an error like this: > {noformat} > ERROR [main] 2015-08-19 15:07:51,577 CassandraDaemon.java:554 - Exception > encountered during startup > java.lang.RuntimeException: A node with address /127.0.0.3 already exists, > cancelling join. Use cassandra.replace_address if you want to replace this > node. > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:720) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:611) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:626) > [main/:na] > {noformat} > However, if {{auto_bootstrap}} is set to false or the node is in its own seed > list, it will not throw this error and will start normally. The new node > then takes over the host ID of the old node (even if the tokens are > different), and the only message you will see is a warning in the other > nodes' logs: > {noformat} > logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, > hostId); > {noformat} > This could cause an operator to accidentally wipe out the token information > for a down node without replacing it. To fix this, we should check for an > endpoint collision even if {{auto_bootstrap}} is false or the node is a seed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)