[ 
https://issues.apache.org/jira/browse/CASSANDRA-15131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lujie updated CASSANDRA-15131:
------------------------------
    Description: 
Reproduce:
 # start a three nodes cluster by : ./bin/cassandra -f
 # shutdown one node by Ctrl-C
 # Then remove a node by:./bin/nodetool  removenode 
2331c0c1-f799-4f35-9323-c57ad020732b
 # But this process is too slow, so we force remove the node by:./bin/nodetool  
removenode force 
 # NPE happens in client
 # 
{code:java}
RemovalStatus: Removing token (-9206149340638432876). Waiting for replication 
confirmation from [/10.3.1.11,/10.3.1.14].
error: null
-- StackTrace --
java.lang.NullPointerException
at 
org.apache.cassandra.gms.VersionedValue$VersionedValueFactory.removedNonlocal(VersionedValue.java:214)
at org.apache.cassandra.gms.Gossiper.advertiseTokenRemoved(Gossiper.java:556)
at 
org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:4353)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1471)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1312)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1404)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:832)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$81(TCPTransport.java:683)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

Code Analysis 

1. removeNode will mark the node as Leaving
{code:java}
tokenMetadata.addLeavingEndpoint(endpoint);
{code}
2. forceRemoveNode then step into remove
{code:java}
1. if (!replicatingNodes.isEmpty() || 
!tokenMetadata.getLeavingEndpoints().isEmpty())
2. {
3.   logger.warn("Removal not confirmed for for {}", 
StringUtils.join(this.replicatingNodes, ","));
4.   for (InetAddress endpoint : tokenMetadata.getLeavingEndpoints())
5.   {
6.      UUID hostId = tokenMetadata.getHostId(endpoint);
7.      Gossiper.instance.advertiseTokenRemoved(endpoint, hostId);
8.      excise(tokenMetadata.getTokens(endpoint), endpoint);
10   }
11   replicatingNodes.clear();
12   removingNode = null;
13 }
{code}
3 .code line#6,will get hostId, but if removeNode execute completely right now 
and it will remove host : *tokenMetadata.removeEndpoint(endpoint);*

So hostId is null.

4. code line#7 will call *hostId.toString(),* hence NPE happens.

 The ugly NPE can't show what happens in force remove request and we should fix 
it. We found this bug in version 3.11.4, the trunk also has this bug. I will 
give the patch soon.

 

  was:
Reproduce:
 # start a three nodes cluster by : ./bin/cassandra -f
 # shutdown one node by Ctrl-C
 # Then remove a node by:./bin/nodetool  removenode 
2331c0c1-f799-4f35-9323-c57ad020732b
 # But this process is too slow, so we force remove the node by:./bin/nodetool  
removenode force 
 # NPE happens in client
 # 
{code:java}
RemovalStatus: Removing token (-9206149340638432876). Waiting for replication 
confirmation from [/10.3.1.11,/10.3.1.14].
error: null
-- StackTrace --
java.lang.NullPointerException
at 
org.apache.cassandra.gms.VersionedValue$VersionedValueFactory.removedNonlocal(VersionedValue.java:214)
at org.apache.cassandra.gms.Gossiper.advertiseTokenRemoved(Gossiper.java:556)
at 
org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:4353)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
at 
com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
at com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
at 
com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
at 
javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1471)
at 
javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
at 
javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1312)
at 
javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1404)
at 
javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:832)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
at sun.rmi.transport.Transport$1.run(Transport.java:200)
at sun.rmi.transport.Transport$1.run(Transport.java:197)
at java.security.AccessController.doPrivileged(Native Method)
at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$81(TCPTransport.java:683)
at java.security.AccessController.doPrivileged(Native Method)
at 
sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
{code}

Code Analysis 

1. removeNode will mark the node as Leaving
{code:java}
tokenMetadata.addLeavingEndpoint(endpoint);
{code}
2. forceRemoveNode then step into remove
{code:java}
1. if (!replicatingNodes.isEmpty() || 
!tokenMetadata.getLeavingEndpoints().isEmpty())
2. {
3.   logger.warn("Removal not confirmed for for {}", 
StringUtils.join(this.replicatingNodes, ","));
4.   for (InetAddress endpoint : tokenMetadata.getLeavingEndpoints())
5.   {
6.      UUID hostId = tokenMetadata.getHostId(endpoint);
7.      Gossiper.instance.advertiseTokenRemoved(endpoint, hostId);
8.      excise(tokenMetadata.getTokens(endpoint), endpoint);
10   }
11   replicatingNodes.clear();
12   removingNode = null;
13 }
{code}
3 .code line#6,will get hostId, but if removeNode execute completely right now 
and it will remove host : *tokenMetadata.removeEndpoint(endpoint);*

So hostId is null.

4. code line#7 will call *hostId.toString(),* hence NPE happens.

We found this bug in version 3.11.4, but we check the trunk also has this bug. 
I will give the patch soon.

 


> NPE while force remove a node
> -----------------------------
>
>                 Key: CASSANDRA-15131
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15131
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: lujie
>            Assignee: lujie
>            Priority: Normal
>         Attachments: 0001-fix-CASSANDRA-15131.patch
>
>
> Reproduce:
>  # start a three nodes cluster by : ./bin/cassandra -f
>  # shutdown one node by Ctrl-C
>  # Then remove a node by:./bin/nodetool  removenode 
> 2331c0c1-f799-4f35-9323-c57ad020732b
>  # But this process is too slow, so we force remove the node 
> by:./bin/nodetool  removenode force 
>  # NPE happens in client
>  # 
> {code:java}
> RemovalStatus: Removing token (-9206149340638432876). Waiting for replication 
> confirmation from [/10.3.1.11,/10.3.1.14].
> error: null
> -- StackTrace --
> java.lang.NullPointerException
> at 
> org.apache.cassandra.gms.VersionedValue$VersionedValueFactory.removedNonlocal(VersionedValue.java:214)
> at org.apache.cassandra.gms.Gossiper.advertiseTokenRemoved(Gossiper.java:556)
> at 
> org.apache.cassandra.service.StorageService.forceRemoveCompletion(StorageService.java:4353)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
> at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at 
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at 
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1471)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
> at 
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1312)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1404)
> at 
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:832)
> at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
> at sun.rmi.transport.Transport$1.run(Transport.java:200)
> at sun.rmi.transport.Transport$1.run(Transport.java:197)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
> at sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
> at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
> at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$81(TCPTransport.java:683)
> at java.security.AccessController.doPrivileged(Native Method)
> at 
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
> Code Analysis 
> 1. removeNode will mark the node as Leaving
> {code:java}
> tokenMetadata.addLeavingEndpoint(endpoint);
> {code}
> 2. forceRemoveNode then step into remove
> {code:java}
> 1. if (!replicatingNodes.isEmpty() || 
> !tokenMetadata.getLeavingEndpoints().isEmpty())
> 2. {
> 3.   logger.warn("Removal not confirmed for for {}", 
> StringUtils.join(this.replicatingNodes, ","));
> 4.   for (InetAddress endpoint : tokenMetadata.getLeavingEndpoints())
> 5.   {
> 6.      UUID hostId = tokenMetadata.getHostId(endpoint);
> 7.      Gossiper.instance.advertiseTokenRemoved(endpoint, hostId);
> 8.      excise(tokenMetadata.getTokens(endpoint), endpoint);
> 10   }
> 11   replicatingNodes.clear();
> 12   removingNode = null;
> 13 }
> {code}
> 3 .code line#6,will get hostId, but if removeNode execute completely right 
> now and it will remove host : *tokenMetadata.removeEndpoint(endpoint);*
> So hostId is null.
> 4. code line#7 will call *hostId.toString(),* hence NPE happens.
>  The ugly NPE can't show what happens in force remove request and we should 
> fix it. We found this bug in version 3.11.4, the trunk also has this bug. I 
> will give the patch soon.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to