Re: [Linux-HA] Corosync on cluster with 3+ nodes

Hermes Flying Sun, 02 Dec 2012 10:59:52 -0800

But the VIP service is in coordination with corosync right? I mean when you 
say:


"if the node hosting the VIP fails, pacemaker may try to restart it or it might 
relocate it, depending on how you've configured things"

What does failure mean? That the service crashed OR also that corosync failed 
(so can not reach rest of nodes)?




________________________________
 From: Digimer <[email protected]>
To: Hermes Flying <[email protected]> 
Cc: General Linux-HA mailing list <[email protected]> 
Sent: Sunday, December 2, 2012 8:46 PM
Subject: Re: [Linux-HA] Corosync on cluster with 3+ nodes
 
As I said; each _service_ can have the concept of "primary", just not
pacemaker itself. I gave an example earlier;

Pacemkaer might have two services;
* DRBD, active on all nodes.
* VIP, active on one node only.

In this example, the DRBD service is Active/Active. If it fails on a
given node, it will try to restart. If that fails, it will *not*
relocate. Here, there is no "primary".

The VIP on the other hand runs on one node at a time only. Generally it
will start on the first active node, but you might configure it to
prefer one node. If that preferred node comes online later, pacemaker
will migrate it. If there is no preferred node, then the VIP will stay
where it is. If the node hosting the VIP fails, pacemaker may try to
restart it or it might relocate it, depending on how you've configured
things. In this case, the VIP service has the concept of "primary",
though it's better to think of it as "Active".

Make sense?

On 12/02/2012 01:35 PM, Hermes Flying wrote:
> Hi,
> So you are saying I should not use the notion of "primary" ok.
> When I have 3 nodes, won't 1 node have the VIP? How is this node defined
> in Pacemaker's terminology if "primary" is inappropriate?
> 
> Best Regards
> 
> ------------------------------------------------------------------------
> *From:* Digimer <[email protected]>
> *To:* Hermes Flying <[email protected]>; General Linux-HA mailing
> list <[email protected]>
> *Sent:* Sunday, December 2, 2012 8:22 PM
> *Subject:* Re: [Linux-HA] Corosync on cluster with 3+ nodes
> 
> On 12/02/2012 02:56 AM, Hermes Flying wrote:
>> Hi,
>> For a cluster with 2 nodes I was explained what would happen. The
> other node will take over using fencing.
> 
> It will take over *after* fencing. Two separate concepts.
> 
> Fencing ensures that a lost node is truly gone and not just partitioned.
> Once fencing succeeds and the lost node is known to be down, _then_
> recovery of service(s) that had been running on the victim will begin.
> 
>> But in clusters with 3+ nodes what happens when corosync fails? I
> assume that if the communication fails with the primary, all other nodes
> consider themselves eligible to become primaries. Is this the case?
> 
> Corosync failing will be treated as a failure in the node and the node
> will be removed and fenced. Any services that had been running on it may
> or may not be recovered, depending on the rules defined for that given
> service. If it is recovered, then where it is restarted again depends on
> how each service was configured.
> 
>> 1)If a node has problem communicating with the primary AND has network
> problem with the rest of the network (clients) does it still try to
> become the primary (try to kill other nodes?)
> 
> Please drop the idea of pacemaker being "primary"; that's the wrong way
> to look at it.
> 
> If pacemaker (via corosync) loses contact with it's peer(s), then it
> checks the quorum policy. If quorum is enabled, it checks to see if it
> had quorum. If it does, it will try to fence it's peer. If it doesn't,
> it will shut down any services it might have been running. Likely in
> this case, one of the nodes with quorum will fence it shortly.
> 
>> 2) In practice if the corosync fails but the primary is still up and
> running and serving requests, is primary attempted to be "killed" by the
> other nodes?Or you use some other way to figure out that this is a
> network failure, primary has not crashed?
> 
> Again, drop the notion of "primary". Whether a node tries to fence it's
> peer is a question of whether it has quorum (or if quorum is disabled).
> Failing corosync is the same as failing the whole node. Pacemaker will
> fail is corosync dies.
> 
>> 3)Finally on corosync failure I assume the primary does nothing, as it
> does not care about the backups. Is this correct?
> 
> This question doesn't make sense.
> 
>> Thank you!
> 
> np
> 
> -- 
> Digimer
> Papers and Projects: https://alteeve.ca/w/
> What if the cure for cancer is trapped in the mind of a person without
> access to education?
> 
> 


-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Corosync on cluster with 3+ nodes

Reply via email to