[ 
https://issues.apache.org/jira/browse/CASSANDRA-10244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-10244:
--------------------------------------
    Assignee:     (was: Joel Knighton)

> Replace heartbeats with locally recorded metrics for failure detection
> ----------------------------------------------------------------------
>
>                 Key: CASSANDRA-10244
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10244
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jason Brown
>
> In the current implementation, the primary purpose of sending gossip messages 
> is for delivering the updated heartbeat values of each node in a cluster. The 
> other data that is passed in gossip (node metadata such as status, dc, rack, 
> tokens, and so on) changes very infrequently (or rarely), such that the 
> eventual delivery of that data is entirely reasonable. Heartbeats, however, 
> are quite different. A continuous and nearly consistent delivery time of 
> updated heartbeats is critical for the stability of a cluster. It is through 
> the receipt of the updated heartbeat that a node determines the reachability 
> (UP/DOWN status) of all peers in the cluster. The current implementation of 
> FailureDetector measures the time differences between the heartbeat updates 
> received about a peer (Note: I said about a peer, not from the peer directly, 
> as those values are disseminated via gossip). Without a consistent time 
> delivery of those updates, the FD, via it's use of the PHI-accrual 
> algorigthm, will mark the peer as DOWN (unreachable). The two nodes could be 
> sending all other traffic without problem, but if the heartbeats are not 
> propagated correctly, each of the nodes will mark the other as DOWN, which is 
> clearly suboptimal to cluster health. Further, heartbeat updates are the only 
> mechanism we use to determine reachability (UP/DOWN) of a peer; dynamic 
> snitch measurements, for example, are not included in the determination. 
> To illustrate this, in the current implementation, assume a cluster of nodes: 
> A, B, and C. A partition starts between nodes A and C (no communication 
> succeeds), but both nodes can communicate with B. As B will get the updated 
> heartbeats from both A and C, it will, via gossip, send those over to the 
> other node. Thus, A thinks C is UP, and C thinks A is UP. Unfortunately, due 
> to the partition between them, all communication between A and C will fail, 
> yet neither node will mark the other as down because each is receiving, 
> transitively via B, the updated heartbeat about the other. While it's true 
> that the other node is alive, only having transitive knowledge about a peer, 
> and allowing that to be the sole determinant of UP/DOWN reachability status, 
> is not sufficient for a correct and effieicently operating cluster. 
> This transitive availability is suboptimal, and I propose we drop the 
> heartbeat concept altogether. Instead, the dynamic snitch should become more 
> intelligent, and it's measurements ultimately become the input for 
> determining the reachability status of each peer(as fed into a revamped FD). 
> As we already capture latencies in the dsntich, we can reasonably extend it 
> to include timeouts/missed responses, and make that the basis for the UP/DOWN 
> decisioning. Thus we will have more accurate and relevant peer statueses that 
> is tailored to the local node.  



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to