[
https://issues.apache.org/jira/browse/CASSANDRA-13058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818371#comment-15818371
]
Stefan Podkowinski commented on CASSANDRA-13058:
------------------------------------------------
The test is passing because hint delivery is not resumable in 3.0
(https://issues.apache.org/jira/browse/CASSANDRA-6230). This has only been
fixed lately in 3.10 (CASSANDRA-11960).
As due to the missing reply messages addressed in the patch, node2 would never
respond, while handling non-local hints. That will in turn cause all callbacks
on node1 to time out and the HintDispatcher will retry hint delievery. At this
point, the FailureDetector is correctly reporting the node as alive and the
dispatch process will not be aborted but simply try to consume the next hints
from a now empty iterator and terminate with a successful return value
afterwards. The log file will contain a "Finished hinted handoff of file
[].hints to endpoint []", which is technically correct, but is probably a bit
misleading in case all writes just timed out.
Even with the FailureDetector reporting the target node as unavailable and
running into the ABORT case, we'd still get the Exception reported in
CASSANDRA-11960. All things considered chances are high that hints will be lost
in 3.0 in case of any errors during delivery.
> dtest failure in hintedhandoff_test.TestHintedHandoff.hintedhandoff_decom_test
> ------------------------------------------------------------------------------
>
> Key: CASSANDRA-13058
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13058
> Project: Cassandra
> Issue Type: Test
> Components: Testing
> Reporter: Sean McCarthy
> Assignee: Stefan Podkowinski
> Priority: Blocker
> Labels: dtest, test-failure
> Fix For: 3.10
>
> Attachments: 13058-3.x.patch, node1.log, node1_debug.log,
> node1_gc.log, node2.log, node2_debug.log, node2_gc.log, node3.log,
> node3_debug.log, node3_gc.log, node4.log, node4_debug.log, node4_gc.log
>
>
> example failure:
> http://cassci.datastax.com/job/cassandra-3.X_novnode_dtest/16/testReport/hintedhandoff_test/TestHintedHandoff/hintedhandoff_decom_test/
> {code}
> Error Message
> Subprocess ['nodetool', '-h', 'localhost', '-p', '7100', ['decommission']]
> exited with non-zero status; exit status: 2;
> stderr: error: Error while decommissioning node: Failed to transfer all hints
> to 59f20b4f-0215-4e18-be1b-7e00f2901629
> {code}{code}
> Stacktrace
> File "/usr/lib/python2.7/unittest/case.py", line 329, in run
> testMethod()
> File "/home/automaton/cassandra-dtest/hintedhandoff_test.py", line 167, in
> hintedhandoff_decom_test
> node1.decommission()
> File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1314, in
> decommission
> self.nodetool("decommission")
> File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 783, in
> nodetool
> return handle_external_tool_process(p, ['nodetool', '-h', 'localhost',
> '-p', str(self.jmx_port), cmd.split()])
> File "/usr/local/lib/python2.7/dist-packages/ccmlib/node.py", line 1993, in
> handle_external_tool_process
> raise ToolError(cmd_args, rc, out, err)
> {code}{code}
> java.lang.RuntimeException: Error while decommissioning node: Failed to
> transfer all hints to 59f20b4f-0215-4e18-be1b-7e00f2901629
> at
> org.apache.cassandra.service.StorageService.decommission(StorageService.java:3924)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.reflect.misc.Trampoline.invoke(MethodUtil.java:71)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.reflect.misc.MethodUtil.invoke(MethodUtil.java:275)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:112)
> at
> com.sun.jmx.mbeanserver.StandardMBeanIntrospector.invokeM2(StandardMBeanIntrospector.java:46)
> at
> com.sun.jmx.mbeanserver.MBeanIntrospector.invokeM(MBeanIntrospector.java:237)
> at com.sun.jmx.mbeanserver.PerInterface.invoke(PerInterface.java:138)
> at com.sun.jmx.mbeanserver.MBeanSupport.invoke(MBeanSupport.java:252)
> at
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.invoke(DefaultMBeanServerInterceptor.java:819)
> at
> com.sun.jmx.mbeanserver.JmxMBeanServer.invoke(JmxMBeanServer.java:801)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doOperation(RMIConnectionImpl.java:1466)
> at
> javax.management.remote.rmi.RMIConnectionImpl.access$300(RMIConnectionImpl.java:76)
> at
> javax.management.remote.rmi.RMIConnectionImpl$PrivilegedOperation.run(RMIConnectionImpl.java:1307)
> at
> javax.management.remote.rmi.RMIConnectionImpl.doPrivilegedOperation(RMIConnectionImpl.java:1399)
> at
> javax.management.remote.rmi.RMIConnectionImpl.invoke(RMIConnectionImpl.java:828)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:497)
> at sun.rmi.server.UnicastServerRef.dispatch(UnicastServerRef.java:323)
> at sun.rmi.transport.Transport$1.run(Transport.java:200)
> at sun.rmi.transport.Transport$1.run(Transport.java:197)
> at java.security.AccessController.doPrivileged(Native Method)
> at sun.rmi.transport.Transport.serviceCall(Transport.java:196)
> at
> sun.rmi.transport.tcp.TCPTransport.handleMessages(TCPTransport.java:568)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(TCPTransport.java:826)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.lambda$run$241(TCPTransport.java:683)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler$$Lambda$284/1694175644.run(Unknown
> Source)
> at java.security.AccessController.doPrivileged(Native Method)
> at
> sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(TCPTransport.java:682)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:745)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)