On 12/02/2012 02:56 AM, Hermes Flying wrote: > Hi, > For a cluster with 2 nodes I was explained what would happen. The other node > will take over using fencing.
It will take over *after* fencing. Two separate concepts. Fencing ensures that a lost node is truly gone and not just partitioned. Once fencing succeeds and the lost node is known to be down, _then_ recovery of service(s) that had been running on the victim will begin. > But in clusters with 3+ nodes what happens when corosync fails? I assume that > if the communication fails with the primary, all other nodes consider > themselves eligible to become primaries. Is this the case? Corosync failing will be treated as a failure in the node and the node will be removed and fenced. Any services that had been running on it may or may not be recovered, depending on the rules defined for that given service. If it is recovered, then where it is restarted again depends on how each service was configured. > 1)If a node has problem communicating with the primary AND has network > problem with the rest of the network (clients) does it still try to become > the primary (try to kill other nodes?) Please drop the idea of pacemaker being "primary"; that's the wrong way to look at it. If pacemaker (via corosync) loses contact with it's peer(s), then it checks the quorum policy. If quorum is enabled, it checks to see if it had quorum. If it does, it will try to fence it's peer. If it doesn't, it will shut down any services it might have been running. Likely in this case, one of the nodes with quorum will fence it shortly. > 2) In practice if the corosync fails but the primary is still up and running > and serving requests, is primary attempted to be "killed" by the other > nodes?Or you use some other way to figure out that this is a network failure, > primary has not crashed? Again, drop the notion of "primary". Whether a node tries to fence it's peer is a question of whether it has quorum (or if quorum is disabled). Failing corosync is the same as failing the whole node. Pacemaker will fail is corosync dies. > 3)Finally on corosync failure I assume the primary does nothing, as it does > not care about the backups. Is this correct? This question doesn't make sense. > Thank you! np -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
