FailureDetector can take a very long time to mark a host down
-------------------------------------------------------------
Key: CASSANDRA-3273
URL: https://issues.apache.org/jira/browse/CASSANDRA-3273
Project: Cassandra
Issue Type: Bug
Components: API
Reporter: Brandon Williams
Assignee: Brandon Williams
There are two ways to trigger this:
* Bring a node up very briefly in a mixed-version cluster and then terminate it
* Bring a node up, terminate it for a very long time, then bring it back up and
take it down again
In the first case, what can happen is a very short interval arrival time is
recorded by the versioning logic which requires reconnecting and can happen
very quickly. This can easily be solved by rejecting any intervals within a
reasonable bound, for instance the gossiper interval.
The second instance is harder to solve, because what is happening is that an
extremely large interval is recorded, which the time the node was left dead the
first time. This throws off the mean of the intervals and causes it to take a
much longer time to mark it down the second time.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira