If some streaming sessions fail on decommission, decommission hangs
-------------------------------------------------------------------
Key: CASSANDRA-3730
URL: https://issues.apache.org/jira/browse/CASSANDRA-3730
Project: Cassandra
Issue Type: Bug
Components: Core
Affects Versions: 1.1
Environment: FreeBSD
Reporter: Vitalii Tymchyshyn
Currently cassandra do not handle StreamOutSession fails, e.g.:
// Instead of just not calling the callback on failure, we could have
// allow to register a specific callback for failures, but we leave
// that to a future ticket (likely CASSANDRA-3112)
if (callback != null && success)
callback.run();
This means that if during decommission a node that receives decommission data
fails or (my case) the node that tries to decommission becomes overloaded, the
streaming session fails and decommission don't know anything about this. This
makes it hard to decommission overloaded nodes because I need to restart the
node to restart decommission.
Also I can see next errors because of streaming files try to get streaming
session that is closed by gossip:
ERROR [Streaming to /10.112.0.216:1] 2012-01-11 15:57:28,882
AbstractCassandraDaemon.java (line 138) Fatal exception in thread
Thread[Streaming to /10.112.0.216:1,5,main]
java.lang.NullPointerException
at
org.apache.cassandra.streaming.FileStreamTask.runMayThrow(FileStreamTask.java:97)
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:679)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira