This is rc2 I am assuming? One thing about remove, the removetoken force command is meant to be run on the node that originally started a remove and doesn't take a token parameter. Not relevant to you problem though.
Is this a test cluster and have you tried to reproduce the error? I would be interested to know what the ring command looks like on both *.19 and *.17 after the decommission is run. I assume you were running the ring command on another node? I'll look into the logs more and see if anything jumps out. On Wed, Dec 15, 2010 at 6:37 AM, Dan Hendry <dan.hendry.j...@gmail.com>wrote: > I am seeing very strange things when trying to decommission a node in my > cluster (detailed logs attached). Here is a nodetool ring report **after** > decommissioning of node 192.168.4.19 (as seen by any other, properly > functioning node). > > > > 192.168.4.15 Up Normal 49.9 GB 25.00% > 42535295865117307932921825928971026431 > > 192.168.4.20 Up Normal 42.56 GB 8.33% > 56713727820156410577229101238628035242 > > 192.168.4.16 Up Normal 29.17 GB 16.67% > 85070591730234615865843651857942052863 > > 192.168.4.19 Down Leaving 54.11 GB 16.67% > 113427455640312821154458202477256070484 > > 192.168.4.17 Down Normal 48.88 GB 8.33% > 127605887595351923798765477786913079295 > > 192.168.4.18 Up Normal 59.44 GB 25.00% > 170141183460469231731687303715884105726 > > 192.168.4.12 Up Normal 52.3 GB 0.00% > 170141183460469231731687303715884105727 > > > > > > What I am seeing is that after nodetool decommission completes on > 192.168.4.19, the next node in the ring (192.168.4.17) ‘dies’ (see attached > log, its nodetool ring report is quite different). By ‘dies’ I mean that it > stops communicating with other nodes (but the Cassandra process is still > running and, among other things, compaction continues). After restarting > Cassandra on 192.168.4.17, the ring state gets stuck and the decommissioned > node (192.168.4.19) does not get removed (at least from the nodetool ring > report): > > > > 192.168.4.15 Up Normal 49.9 GB 25.00% > 42535295865117307932921825928971026431 > > 192.168.4.20 Up Normal 42.56 GB 8.33% > 56713727820156410577229101238628035242 > > 192.168.4.16 Up Normal 29.17 GB 16.67% > 85070591730234615865843651857942052863 > > 192.168.4.19 Down Leaving 54.11 GB 16.67% > 113427455640312821154458202477256070484 > > 192.168.4.17 Up Normal 69.12 GB 8.33% > 127605887595351923798765477786913079295 > > 192.168.4.18 Up Normal 58.88 GB 25.00% > 170141183460469231731687303715884105726 > > 192.168.4.12 Up Normal 52.3 GB 0.00% > 170141183460469231731687303715884105727 > > > > > > Furthermore, when I try running “nodetool removetoken > 113427455640312821154458202477256070484”, I get: > > > > Exception in thread "main" java.lang.UnsupportedOperationException: Node / > 192.168.4.19 is already being removed. > > at > org.apache.cassandra.service.StorageService.removeToken(StorageService.java:1731) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) > > > > > > And when I try running “nodetool removetoken force > 113427455640312821154458202477256070484”, I get: > > > > RemovalStatus: No token removals in process. > > Exception in thread "main" java.lang.NullPointerException > > at > org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:1703) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:93) > > > > ?!?!?!? > > > > I think have seen this type of behaviour once or twice before (I believe > 0.7 b1 or later) but wrote it off as being caused by my misguided tinkering > and/or other Cassandra bugs. This time around, I have done very little with > JMX/CLI/nodetool and I can find no related Cassandra bugs. > > > > Help/suggestions? > > > > Dan Hendry > > (403) 660-2297 > > >