[
https://issues.apache.org/jira/browse/CASSANDRA-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13798300#comment-13798300
]
Tyler Hobbs commented on CASSANDRA-5916:
----------------------------------------
That strategy sounds good to me in principle.
I'm seeing a few problems when testing, though.
If I start node4 with replace_address=node3 (while node3 is either up or down),
I get an NPE:
{noformat}
DEBUG 14:01:33,359 Node /127.0.0.4 state normal, token [6564349027099416762]
INFO 14:01:33,362 Node /127.0.0.4 state jump to normal
ERROR 14:01:33,363 Exception encountered during startup
java.lang.NullPointerException
at org.apache.cassandra.gms.Gossiper.usesHostId(Gossiper.java:682)
at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:694)
at
org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1382)
at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1250)
at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:973)
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1187)
at
org.apache.cassandra.service.StorageService.setTokens(StorageService.java:214)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:824)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:584)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:481)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:447)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:490)
java.lang.NullPointerException
at org.apache.cassandra.gms.Gossiper.usesHostId(Gossiper.java:682)
at org.apache.cassandra.gms.Gossiper.getHostId(Gossiper.java:694)
at
org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1382)
at
org.apache.cassandra.service.StorageService.onChange(StorageService.java:1250)
at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:973)
at
org.apache.cassandra.gms.Gossiper.addLocalApplicationState(Gossiper.java:1187)
at
org.apache.cassandra.service.StorageService.setTokens(StorageService.java:214)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:824)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:584)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:481)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:447)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:490)
Exception encountered during startup: null
ERROR 14:01:33,368 Exception in thread Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException
at
org.apache.cassandra.service.StorageService.stopRPCServer(StorageService.java:321)
at
org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:370)
at
org.apache.cassandra.service.StorageService.access$000(StorageService.java:88)
at
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:549)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at java.lang.Thread.run(Thread.java:724)
{noformat}
If I do replace_address with a non-existent node, after the ring delay sleep,
I'll see:
{noformat}
java.lang.RuntimeException: Unable to gossip with any seeds
{noformat}
which is misleading, as that's not the actual problem. Perhaps we should
explicitly check for presence of the address to replace?
I've also seen that the node to replace can be the seed selected to gossip
with, which results in this:
{noformat}
INFO 14:12:58,298 Gathering node replacement information for /127.0.0.3
INFO 14:12:58,302 Starting Messaging Service on port 7000
DEBUG 14:12:58,316 attempting to connect to /127.0.0.3
ERROR 14:13:29,320 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
at org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1123)
at
org.apache.cassandra.service.StorageService.prepareReplacementInfo(StorageService.java:396)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:603)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:584)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:481)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:447)
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:490)
{noformat}
> gossip and tokenMetadata get hostId out of sync on failed replace_node with
> the same IP address
> -----------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-5916
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5916
> Project: Cassandra
> Issue Type: Bug
> Reporter: Brandon Williams
> Assignee: Brandon Williams
> Fix For: 1.2.12
>
> Attachments: 5916.txt, 5916-v2.txt, 5916-v3.txt
>
>
> If you try to replace_node an existing, live hostId, it will error out.
> However if you're using an existing IP to do this (as in, you chose the wrong
> uuid to replace on accident) then the newly generated hostId wipes out the
> old one in TMD, and when you do try to replace it replace_node will complain
> it does not exist. Examination of gossipinfo still shows the old hostId,
> however now you can't replace it either.
--
This message was sent by Atlassian JIRA
(v6.1#6144)