Hi! Nice explanation! I have one question: With multiple rings, is Corosync expecting the tokens to rotate with the same speed? I'm thinking of a scenario where both rings operate with different speeds, so the token will rotate at the same speed at low or medium network load, but might rotate with different speeds when the slower ring uses the full bandwidth. I have the impression that Corosync makes one of the rings as faulty (for less than one second), then.
Regards, Ulrich >>> Digimer <[email protected]> schrieb am 16.10.2013 um 16:44 in Nachricht <[email protected]>: > On 15/10/13 22:36, 邢立明 wrote: >> Hello dear Heartbeat team: >> >> Thank you very much for your reply,I still have the following two >> questions: >> >> 1、How to get the heart line disconnected, Heartbeat triggered by events? >> 2、Heartbeat is disconnected, how to set only one machine provides service? > > Corosync uses the totem protocol for "heartbeat" like monitoring of the > other node's health. A token is passed around to each node, the node > does some work (like acknowledge old messages, send new ones), and then > it passes the token on to the next node. This goes around and around all > the time. Should a node note pass it's token on after a short timeout > period, the token is declared lost, an error count goes up and a new > token is sent. If too many tokens are lost in a row, the node is > declared lost/dead. > > Once the node is declared lost, the remaining nodes reform a new > cluster. If enough nodes are left to form quorum (simple majority), then > the new cluster will continue to provide services. In two-node clusters, > quorum is disabled so each node can work on it's own. > > Corosync itself only cares about cluster membership, message passing and > quorum (as of corosync v2+). What happens after the cluster reforms is > up to the cluster resource manager. In this case, that would be pacemaker. > > When pacemaker is told that membership has changed because a node died, > it looks to see what services might have been lost. Once it knows what > was lost, it looks at the rules it's been given and decides what to do. > > Generally, the first thing it does is "stonith" the lost node. This is a > process where the lost node is powered off, called power fencing, or cut > off from the network/storage, called fabric fencing. In either case, the > idea is to make sure that the lost node is in a known state. If this is > skipped, the node could recover later and try to provide cluster > services, not having realized that it was removed from the cluster. This > could cause problems from confusing switches to corrupting data. > > In two-node clusters, there is also a chance of a "split-brain". Because > quorum has to be disabled, it is possible for both nodes to think the > other node is dead and both try to provide the same cluster services. By > using stonith, after the nodes break from one another (which could > happen with a network failure, for example), neither node will offer > services until one of them has stonith'ed the other. The faster node > will win and the slower node will shut down (or be isolated). The > survivor can then run services safely without risking a split-brain. > > Once the dead node has been stonithed, pacemaker then decides what to do > with the lost services. Generally, this means "restart the service here > that had been running on the dead node". The details of this, though, > are decided by you when you configure the resources in pacemaker. > > Hope this helps! It's pretty high-level and simplifies a few things, but > hopefully it helps you understand the mechanics. :) > > digimer > > PS - Please reply to the mailing list. Discussions like this can help > others by being public and stored in archives. > > -- > Digimer > Papers and Projects: https://alteeve.ca/w/ > What if the cure for cancer is trapped in the mind of a person without > access to education? > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
