Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker

Hermes Flying Sat, 01 Dec 2012 05:32:30 -0800

Thanks for your reply.
First of all I didn't get if the VIP will migrate if Tomcat or load balancer 
also fails. It will right?
Also if I understand this correctly, I can end up with VIP on both nodes if 
corosync fails due to network failure. And you suggest redundant communication 
paths to avoid this.
But if I understand the problem, if the VIP runs in my linux-1 and pacemaker is 
somehow via corosync ready to take over on failure from linux-2, if there is a 
network failure (despite redundant communication paths, unless you guys 
recommend some specific topology to the people using Pacemaker that you are 
100% full proof) how can you detect if the other node is actually crashed or 
just corosync fails? In this case won't the linux-2 also "wakeup" to take VIP?
Could you please help me understand this?


Thank you!




________________________________
 From: David Coulson <[email protected]>
To: Hermes Flying <[email protected]>; General Linux-HA mailing list 
<[email protected]> 
Cc: Digimer <[email protected]> 
Sent: Saturday, December 1, 2012 2:46 PM
Subject: Re: [Linux-HA] Some help on understanding how HA issues are addressed 
by pacemaker
 

On 12/1/12 5:46 AM, Hermes Flying wrote:
> Thank you for this!
> 
> One last thing I need to clear out before digging into your configuration 
> specs etc.
> Since the pacemaker is a fail-over system rather than a load-balancing system 
> (like Red Hat) as you say, my understanding is that one of my nodes will have 
> the VIP until:
> 1) Tomcat crashes and can not restart (dead for some reason) --> Pacemaker 
> migrates VIP
> 
> 2) The network communication with the outside network is cut off. --> 
> Pacemaker migrates VIP
> 
> If these (2) are valid (are they?) then that means that there is no 
> primary/backup concept using pacemaker (since I will assign to one of my 
> nodes to have the VIP and my installed Load Balancer will distribute the load 
> among my 2 Tomcats) and as a result there can not be a split-brain.
In the event of a split brain with Pacemaker, and you don't have any fencing 
configured, you will end up with your VIP running on both systems. Chances in 
your configuration it won't be a big deal since your router/firewall/whatever 
will learn the ARP of one system, so you'll end up routing traffic properly - 
But it will be unpredictable, and difficult to troubleshoot.
> 
> Yet you imply that split-brain can occur even with Pacemaker if I don't have 
> fencing properly set.
> But how? Since it seems to me that Pacemaker does not have a notion of 
> primary/backup. Or you mean something else with "fail-over" system?
For each resource, Pacemaker knows there is a node where it is running, and 
'other' nodes where it is not running (but could if the node running it 
failed). So from a resource perceptive, there is an active node and one or more 
backups.
> 
> Additionally you say that the "coordination" of Pacemaker instances is done 
> via corosync which is over network messages right?
> So what happens in the event of communication/network failure but only in the 
> communication paths used for corosync coordination and not the communication 
> path with the clients? Hope this question makes sense as I am new in your 
> facilities.
Split brain. That's why you need a redundant communications network, plus you 
need fencing.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker

Reply via email to