[ 
https://issues.apache.org/jira/browse/CASSANDRA-10134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15243068#comment-15243068
 ] 

Sam Tunnicliffe commented on CASSANDRA-10134:
---------------------------------------------

One of the MV dtests uncovered a small problem for which I've pushed an 
additional commit, and otherwise CI looks good now. 

Building an MV involves writes to the {{system_distributed}} keyspace, which in 
turn requires replica info and so can't be done until we've gone through 
initialization of {{StorageService}}. In fact, in {{CassandraDaemon}} where 
build tasks for all views are submitted at startup (to force completion of any 
interrupted builds), the comment mentions that SS must be initialized first. 
However, the {{Keyspace}} constructor also triggers submission of build tasks 
for all of it's views via {{ViewManager::reload}} and this happens prior to SS 
initialization during startup. So there's a race at startup between SS 
initialization and any view build task reaching a point where it needs to 
update {{system_distributed}}; the window for this race is widened here by the 
mandatory shadow round and so 
{{MaterializedViewTest.interrupt_build_process_test}} was failing pretty 
regularly. The downside of the fix in the patch is that MV builds won't get 
submitted while gossip is stopped (via JMX or nodetool) as this marks SS as 
uninitialized. This doesn't seem like a particularly big problem to me, but if 
there are concerns over that I'm willing to revisit.


> Always require replace_address to replace existing address
> ----------------------------------------------------------
>
>                 Key: CASSANDRA-10134
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10134
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Distributed Metadata
>            Reporter: Tyler Hobbs
>            Assignee: Sam Tunnicliffe
>             Fix For: 3.x
>
>
> Normally, when a node is started from a clean state with the same address as 
> an existing down node, it will fail to start with an error like this:
> {noformat}
> ERROR [main] 2015-08-19 15:07:51,577 CassandraDaemon.java:554 - Exception 
> encountered during startup
> java.lang.RuntimeException: A node with address /127.0.0.3 already exists, 
> cancelling join. Use cassandra.replace_address if you want to replace this 
> node.
>       at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:543)
>  ~[main/:na]
>       at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:783)
>  ~[main/:na]
>       at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:720)
>  ~[main/:na]
>       at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:611)
>  ~[main/:na]
>       at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378) 
> [main/:na]
>       at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:537)
>  [main/:na]
>       at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:626) 
> [main/:na]
> {noformat}
> However, if {{auto_bootstrap}} is set to false or the node is in its own seed 
> list, it will not throw this error and will start normally.  The new node 
> then takes over the host ID of the old node (even if the tokens are 
> different), and the only message you will see is a warning in the other 
> nodes' logs:
> {noformat}
> logger.warn("Changing {}'s host ID from {} to {}", endpoint, storedId, 
> hostId);
> {noformat}
> This could cause an operator to accidentally wipe out the token information 
> for a down node without replacing it.  To fix this, we should check for an 
> endpoint collision even if {{auto_bootstrap}} is false or the node is a seed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to