[
https://issues.apache.org/jira/browse/CASSANDRA-2947?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13070726#comment-13070726
]
Brandon Williams commented on CASSANDRA-2947:
---------------------------------------------
bq. The problem is that the failure detector never learns about node B -
FD.report is never called for B.
This isn't quite right, the FD knows about B and is still calculating phi for
it, but it is never reported for some reason:
{noformat}
TRACE 20:12:12,201 Performing status check ...
TRACE 20:12:12,201 PHI for /10.179.111.137 : 19.923693651793577
TRACE 20:12:12,201 PHI for /10.179.65.102 : 0.43267044519934433
{noformat}
> New nodes always think dead nodes are alive
> -------------------------------------------
>
> Key: CASSANDRA-2947
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2947
> Project: Cassandra
> Issue Type: Bug
> Affects Versions: 0.8.2
> Reporter: Richard Low
> Attachments: 2947.txt
>
>
> If a new node is brought up while a node is down, it will think it is up
> forever.
> To reproduce:
> Take nodes A, B and C.
> 1. Bring up nodes A and B in a cluster
> 2. Take down B and wait for A to mark it as down
> 3. Bring up C with A as a seed
> 4. nodetool ring on C shows all 3 nodes as up and never marks B as down
> The problem is that the failure detector never learns about node B -
> FD.report is never called for B. This means requests are constantly routed
> to B from C and timeout, but they should fail with UnavailableException.
> The attached (hack) patch appears to fix it, but I expect the problem is
> actually elsewhere in the gossip code.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira