Sounds like you want to bootstrap to a token close to but not the same as the old one then. :)
2010/1/19 XL.Pan <pan_xiao...@sina.com>: > Hi Jonathan : > I find other nodes will throw exception when I use the same TokenID but > different IP address for a new node while bootstrapping. My operations are > that : > 1) down nodeA which token is T, IP address is IPA; > 2) add a new node B which token is also T but IP address is IPB; > 3) After nodeA knows nodeB is bootstrapping, A throw exception: > > DEBUG [GMFD:2] 2010-01-19 22:22:36,667 StorageService.java (line 439) Node > /10.81.37.52 state bootstrapping, token > 136112946768375385385349842972707284580 > ERROR [GMFD:2] 2010-01-19 22:22:36,667 DebuggableThreadPoolExecutor.java > (line 157) Error in ThreadPoolExecutor > java.lang.RuntimeException: Bootstrap Token collision between /10.81.37.65 > and /10.81.37.52 (token 136112946768375385385349842972707284580 > at > org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:136) > at > org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:456) > at > org.apache.cassandra.service.StorageService.onChange(StorageService.java:419) > at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692) > at > org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657) > at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:611) > at > org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:979) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:619) > ERROR [GMFD:2] 2010-01-19 22:22:36,667 CassandraDaemon.java (line 71) Fatal > exception in thread Thread[GMFD:2,5,main] > java.lang.RuntimeException: Bootstrap Token collision between /10.81.37.65 > and /10.81.37.52 (token 136112946768375385385349842972707284580 > at > org.apache.cassandra.locator.TokenMetadata.addBootstrapToken(TokenMetadata.java:136) > at > org.apache.cassandra.service.StorageService.handleStateBootstrap(StorageService.java:456) > at > org.apache.cassandra.service.StorageService.onChange(StorageService.java:419) > at org.apache.cassandra.gms.Gossiper.doNotifications(Gossiper.java:692) > at > org.apache.cassandra.gms.Gossiper.applyApplicationStateLocally(Gossiper.java:657) > at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:611) > at > org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(Gossiper.java:979) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:38) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) > at java.lang.Thread.run(Thread.java:619) > > I traced the codes and find the reason is that : > public void addBootstrapToken(Token token, InetAddress endpoint) > { > ...... > oldEndPoint = bootstrapTokens.get(token); > if (oldEndPoint != null && !oldEndPoint.equals(endpoint)) > throw new RuntimeException("Bootstrap Token collision between > " + oldEndPoint + " and " + endpoint + " (token " + token); > // the exception is here --| :-) > v > oldEndPoint = tokenToEndPointMap.get(token); > if (oldEndPoint != null && !oldEndPoint.equals(endpoint)) > throw new RuntimeException("Bootstrap Token collision between > " + oldEndPoint + " and " + endpoint + " (token " + token); > ...... > } > > In my enviroment, A is still in the tokenToEndPointMap though it's down. As a > result, A will not know B is bootstrapping and will not route any request to > B until B changes to NORMAL. > > I think the following changes may be one way to deal with this issue : > > if (oldEndPoint != null && !oldEndPoint.equals(endpoint) && > FailureDetector.instance().isAlive(oldEndPoint)) > throw new RuntimeException("Bootstrap Token collision between > " + oldEndPoint + " and " + endpoint + " (token " + token); > > Will you please give me some advice? Thanks! > > > ------------------ > XL.Pan > 2010-01-20 > > ------------------------------------------------------------- > 发件人:Michael Lee > 发送日期:2010-01-19 21:37:52 > 收件人:cassandra-u...@incubator.apache.org > 抄送: > 主题:RE: Re: replace a bad node through bootstrapping > > Sorry, My bad! > > So, if use same token different IP, ' nodeprobe removetoken ' process is not > needed, am I right? > > If so, I think the wiki page should state it clear, my advice.... > > And once again, the replaced node damage again, we replace the bad one with > new node, this time we use the same token again, and use the 'old' IP > For example: > > Step1: replace A (token=11111, ip=1.1.1.1) with B(token=11111, ip=2.2.2.2), > according we discussed before, it will work > Step2: we fix A, format the whole file system, now B go to bad, we replace > B(token=11111, ip=2.2.2.2) with A (token=11111, ip=1.1.1.1) because we only > one backup IP and backup node. > > Does step 2 still work? > > > ---------END---------- > > -----Original Message----- > From: Jonathan Ellis [mailto:jbel...@gmail.com] > Sent: Tuesday, January 19, 2010 9:57 AM > To: cassandra-user@incubator.apache.org > Subject: Re: Re: replace a bad node through bootstrapping > > This is described in the "Handling failure" section of the Operations page. > > I believe it will work even if you use the same token as the old node, yes. > > -Jonathan > > 2010/1/18 Michael Lee <mail.list.steel.men...@gmail.com>: >> Even good node has the same token as bad one? >> >> Is it an 'documented' operation? I haven't seen it in wiki > links(Operation). >> >> ---------END---------- >> >> -----Original Message----- >> From: Jonathan Ellis [mailto:jbel...@gmail.com] >> Sent: Monday, January 18, 2010 11:22 PM >> To: cassandra-user@incubator.apache.org >> Subject: Re: Re: Re: replace a bad node through bootstrapping >> >> yes >> >> On Mon, Jan 18, 2010 at 4:37 AM, XL.Pan <pan_xiao...@sina.com> wrote: >>> Hi Jonathan: >>> "the old node can be the replacement, as long as you change its IP >> address" >>> >>> Do you mean that the operations to replace a bad node is : >>> 1) choose a new machine which has the same configuration, eg. >> InitialToken, and has a different IP address; >>> 2) start the new machine, which will start boostrapping; >>> 3) After bootstrapping, the new machine will restore the data as before. >>> >>> (All nodes' InitialToken are set manully) >>> >>> I have tried in this way and that looks ok. Is this a good way? :-) >>> Thanks !! >>> >>> >>> ------------------ >>> XL.Pan >>> 2010-01-18 >>> >>> ------------------------------------------------------------- >>> 发件人:Jonathan Ellis >>> 发送日期:2010-01-15 11:12:05 >>> 收件人:cassandra-user >>> 抄送: >>> 主题:Re: Re: replace a bad node through bootstrapping >>> >>> On Thu, Jan 14, 2010 at 9:02 PM, Michael Lee >>> <mail.list.steel.men...@gmail.com> wrote: >>>> If a node's data has been damaged, you cannot use new node replace old >> one directly, unless 'removetoken' first. >>>> >>>> But, (suppose node A is dead) >>>> 'removetoken' will complement missing replica due A's death first, it >> will generate lot data on other nodes, say it's B, C, D >>>> After add new node and copy data from other node through bootstrapping, >> you have to 'cleanup' data just >>>> generate from ' removetoken ' on B, C, D >>>> >>>> So, B/C/D will have heavy I/O load (half of them is waste) due to repair >> A, in pan's case, it will be 5TB (and will cause days...) >>>> >>>> Pan try to invent a method to repair A directly through streaming, and >> have less impact on other nodes. >>> >>> Thanks for clarifying that. >>> >>> I thought we agreed in your last thread about this that bootstrapping >>> a replacement node (the old node can be the replacement, as long as >>> you change its IP address) first, then removing the entry for the dead >>> one, would be a reasonable procedure here. >>> >>> -Jonathan >>> >> >> > >