Re: [Linux-HA] Need help in setting up a two node cluster in different subnet

2013-03-25 Thread David Coulson
Assuming the two systems can already route between each other, just configure 
udpu as your transport.

Example config:

http://lists.corosync.org/pipermail/discuss/2011-October/48.html

On Mar 24, 2013, at 11:59 AM, deep saran wrote:

 Hi all,
 
 I am trying to setup a two node cluster where both nodes are in different
 subnet. I am using corosync/pacemaker for this. I am newbie in this field
 and have no idea how to configure corosync/pacemaker to work in different
 subnet. My initial investigation suggest that it is possible using VIPArip
 along with ripd/quagga but i dont understand how to configure them to work
 in my networking environment. Can anyone suggest any document or demostrate
 using an example, i will be really grateful.
 
 Please help me if you have any prior experience in this field.
 
 Thanks in advance!!!
 
 Regards,
 Deep
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Many Resources Dependent on One Resource Group

2013-03-24 Thread David Coulson
On 3/24/13 12:58 PM, Robinson, Eric wrote:
 In the simplest terms, we currently have resources:

 A = drbd
 B = filesystem
 C = cluster IP
 D thru J = mysql instances.

 Resource group G1 consists of resources B through J, in that order, and is 
 dependent on resource A.

 This fails over fine, but it has the serious disadvantage that if you stop or 
 remove a mysql resource in the middle of the list, all of the ones after it 
 stop too. For example, if you stop G, then H thru J stop as well.

 We want to change it so that the resource group G1 consists only of resources 
 B  C. All of the mysql instances (D thru J) are individually dependent on 
 group G1, but not dependent on each other. That way you can stop or remove a 
 mysql resource without affecting the others.

 I saw this scenario described in the Pacemaker docs, but I cannot find an 
 example of the syntax.
Try adding this to the group:

   meta ordered=false

Or you could take the MySQL instances out of the group and make them 
each individually dependent on drbd/filesystem with a collocation/order 
ruleset.

Also, you might want to use lrmadmin to set 'max_children' to something 
more than 4, otherwise it'll only startup 4 MySQL instances at once. 
Obviously dependent upon how much horsepower you have, but we set it to 
32 or 64 (although most of our resources are just IPaddr).
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker

2012-12-01 Thread David Coulson

On 12/1/12 5:46 AM, Hermes Flying wrote:
 Thank you for this!

 One last thing I need to clear out before digging into your configuration 
 specs etc.
 Since the pacemaker is a fail-over system rather than a load-balancing system 
 (like Red Hat) as you say, my understanding is that one of my nodes will have 
 the VIP until:
 1) Tomcat crashes and can not restart (dead for some reason) -- Pacemaker 
 migrates VIP

 2) The network communication with the outside network is cut off. -- 
 Pacemaker migrates VIP

 If these (2) are valid (are they?) then that means that there is no 
 primary/backup concept using pacemaker (since I will assign to one of my 
 nodes to have the VIP and my installed Load Balancer will distribute the load 
 among my 2 Tomcats) and as a result there can not be a split-brain.
In the event of a split brain with Pacemaker, and you don't have any 
fencing configured, you will end up with your VIP running on both 
systems. Chances in your configuration it won't be a big deal since your 
router/firewall/whatever will learn the ARP of one system, so you'll end 
up routing traffic properly - But it will be unpredictable, and 
difficult to troubleshoot.

 Yet you imply that split-brain can occur even with Pacemaker if I don't have 
 fencing properly set.
 But how? Since it seems to me that Pacemaker does not have a notion of 
 primary/backup. Or you mean something else with fail-over system?
For each resource, Pacemaker knows there is a node where it is running, 
and 'other' nodes where it is not running (but could if the node running 
it failed). So from a resource perceptive, there is an active node and 
one or more backups.

 Additionally you say that the coordination of Pacemaker instances is done 
 via corosync which is over network messages right?
 So what happens in the event of communication/network failure but only in the 
 communication paths used for corosync coordination and not the communication 
 path with the clients? Hope this question makes sense as I am new in your 
 facilities.
Split brain. That's why you need a redundant communications network, 
plus you need fencing.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker

2012-12-01 Thread David Coulson

On 12/1/12 8:21 AM, Hermes Flying wrote:
 Thanks for your reply.
 First of all I didn't get if the VIP will migrate if Tomcat or load 
 balancer also fails. It will right?
If you configure Pacemaker correctly, yes.
 Also if I understand this correctly, I can end up with VIP on both 
 nodes if corosync fails due to network failure. And you suggest 
 redundant communication paths to avoid this.
 But if I understand the problem, if the VIP runs in my linux-1 and 
 pacemaker is somehow via corosync ready to take over on failure from 
 linux-2, if there is a network failure (despite redundant 
 communication paths, unless you guys recommend some specific topology 
 to the people using Pacemaker that you are 100% full proof) how can 
 you detect if the other node is actually crashed or just corosync 
 fails? In this case won't the linux-2 also wakeup to take VIP?
That is what fencing is for. If linux-1 goes offline from the 
perspective of linux-2, linux-2 will attempt to 
crash/power-cycle/power-off linux-1 to ensure it is really dead. Any 
resource previously running on linux-1 will be started on linux-2.

Usually with a two node config I take two NICs on each box and connect 
them directly to the other one - Also would work if you had two separate 
switches you could run each path through. Then I use Linux NIC bonding 
to provide redundancy and run corosync over the bond interface.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker

2012-12-01 Thread David Coulson

On 12/1/12 8:48 AM, Hermes Flying wrote:
 Great help! Please allow me to trouble you with one last question.

 If I get this, when I use fencing and the corosync fails then linux-2 
 will attempt to crash linux-1 and take over. At this point though 
 linux-1 won't try to do anything right? Since it knows it is the 
 primary, I mean.

linux-1 will be powered off or crashed, so i think that speaks for itself.

 Then you say:Any resource previously running on linux-1 will be 
 started on linux-2.
 Now at this point: By resource you mean only pacemaker and its related 
 modules, right? Because I want  Tomcat to be up and running and 
 receiving requests in Linux-2 as well, which will be forwarded by load 
 balancer of linux-1. Is this correct?

I mean 'resources managed by pacemaker'. So if you VIP was running on 
linux-1, and it fails, and linux-2 fences it, the only place the VIP can 
run is linux-2. linux-1 is totally down.

 Also in your setup of 2 NICs or 2 switches I assume that the idea is 
 that the probability of split-brain due to network failure is very low 
 right? Because I have read that it is not possible to avoid 
 split-brain without adding a third node. But I may be misunderstanding 
 this
A third node will eliminate split brain by definition, as quorum will 
only be obtained if a minimum of two nodes are available.

If you have a diverse network configuration and good change management, 
you're probably not going to experience a split brain unless you have a 
substantial environment failure that will probably impact your client 
ability to access anything. Since you are not running shared storage, 
you're not going to experience data loss which is typically the biggest 
concern with split brain.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker

2012-12-01 Thread David Coulson
Already told you. If you are running two-node, make sure your fencing 
works and you have reliable connectivity between nodes.

if that isn't good enough, add a third node.

On 12/1/12 8:58 AM, Hermes Flying wrote:
 Actually each Tomcat uses a back-end database that has the notion of 
 primary/backup.
 I am trying to figure out if by using Pacemaker facilities I can avoid 
 splitbrain in the database as well. So far from what you described I 
 seem to get away with it meaning that by fencing, linux-1 will stop so 
 the secondary database in lunux-2 will become primary.
 Am I on the right track here? If you have any recommendations for my 
 setup (2 linux running: 2 LB/2Tomcat/2Databases) please let me know!
 Thank you for your time!


 
 *From:* David Coulson da...@davidcoulson.net
 *To:* Hermes Flying flyingher...@yahoo.com
 *Cc:* General Linux-HA mailing list linux-ha@lists.linux-ha.org; 
 Digimer li...@alteeve.ca
 *Sent:* Saturday, December 1, 2012 3:53 PM
 *Subject:* Re: [Linux-HA] Some help on understanding how HA issues are 
 addressed by pacemaker


 On 12/1/12 8:48 AM, Hermes Flying wrote:
 Great help! Please allow me to trouble you with one last question.

 If I get this, when I use fencing and the corosync fails then linux-2 
 will attempt to crash linux-1 and take over. At this point though 
 linux-1 won't try to do anything right? Since it knows it is the 
 primary, I mean.

 linux-1 will be powered off or crashed, so i think that speaks for itself.

 Then you say:Any resource previously running on linux-1 will be 
 started on linux-2.
 Now at this point: By resource you mean only pacemaker and its 
 related modules, right? Because I want  Tomcat to be up and running 
 and receiving requests in Linux-2 as well, which will be forwarded by 
 load balancer of linux-1. Is this correct?

 I mean 'resources managed by pacemaker'. So if you VIP was running on 
 linux-1, and it fails, and linux-2 fences it, the only place the VIP 
 can run is linux-2. linux-1 is totally down.

 Also in your setup of 2 NICs or 2 switches I assume that the idea is 
 that the probability of split-brain due to network failure is very 
 low right? Because I have read that it is not possible to avoid 
 split-brain without adding a third node. But I may be 
 misunderstanding this
 A third node will eliminate split brain by definition, as quorum will 
 only be obtained if a minimum of two nodes are available.

 If you have a diverse network configuration and good change 
 management, you're probably not going to experience a split brain 
 unless you have a substantial environment failure that will probably 
 impact your client ability to access anything. Since you are not 
 running shared storage, you're not going to experience data loss which 
 is typically the biggest concern with split brain.




___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Active/Active

2012-08-02 Thread David Coulson
From your post on the drbd list it looks like your storage is messed up. 

Post the output of crm_mon -1fr from both nodes. Guessing your drbd resource 
didn't promote to master on one node because your drbd cluster is out of sync. 

Sent from my iPad

On Aug 2, 2012, at 4:22 AM, Yount, William D 
yount.will...@menloworldwide.com wrote:

 I am using CentOS 6.2, mainly because that is that is what the Clusters from 
 Scratch documentation uses.
 I am using GFS2.
 
 Yeah, I had some issues with the cman/corosync/pacemaker startup order at 
 first. But CentOS uses chkconfig which allowed me to turn corosync off, cman 
 on and pacemaker on. Ever since, I haven't had an issue with the cluster 
 coming up on its own. Previously I had to stop coryosync, start cman and then 
 start pacemaker. 
 
 I do seem to have some issues with the cluster recognizing when a resource is 
 down but I believe that may be tied to pacemaker which I am using for 
 fencing. There is something wrong with my cib.xml configuration. All the 
 resources seem to be docked at Node2. If I cut Node2 off or unplug a 
 network cable from Node2 than my cluster goes down. Even my cloned IP address 
 goes down.
 
 
 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org 
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Bruno MACADRE
 Sent: Thursday, August 02, 2012 2:38 AM
 To: linux-ha@lists.linux-ha.org
 Subject: Re: [Linux-HA] Active/Active
 
 Hi,
 
I'm working on a cluster of the same type as yours and I still ask myself 
 some questions. Every feedback will be appreciated :
 
 - What OS did you use ?
I do a lot of work on Ubuntu 12.04 server but I'm not satisfied by the 
 cluster stability mainly with OCFS2
 
 - Do you use clustered filesystem (like OCFS2 or GFS2) in you Dual Primary 
 DRBD ? Did you have special concerns ?
I had a lot of concern for use OCFS2 on Ubuntu server 'cause it need a 
 cman stack above corosync which made entire cluster unstable. For example : 
 When I do 'node standby node1' it kill pacemaker whithout the possibility to 
 comme back online at all !!!
 
 - Have you been encountering some obstacles when NFS resources switch during 
 high use ?
 
 I gladly give a look at your cib.xml file if you wish.
 
 Regards,
 Bruno
 
 Le 01/08/2012 11:22, Yount, William D a écrit :
 I was wondering if someone could give a look over my cib.xml file and see if 
 I need to change anything. I am attempting to create an Active/Active 
 cluster offering up a DRBD volume for NFS share. Everything works fine as it 
 is, but I would like someone more knowledgeable to look it over at your 
 leisure.
 
 
 
 Thank you,
 william
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
 
 
 -- 
 
 Bruno MACADRE
 ---
  Ingénieur Systèmes et Réseau | Systems and Network Engineer
  Département Informatique | Department of computer science
  Responsable Réseau et Téléphonie | Telecom and Network Manager
  Université de Rouen  | University of Rouen
 ---
 Coordonnées / Contact :
Université de Rouen
Faculté des Sciences et Techniques - Madrillet
Avenue de l'Université - BP12
76801 St Etienne du Rouvray CEDEX
FRANCE
 
Tél : +33 (0)2-32-95-51-86
Fax : +33 (0)2-32-95-51-87
 ---
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Question about Pacemaker Load-balancing

2012-06-04 Thread David Coulson


On 6/4/12 7:23 AM, alain.mou...@bull.net wrote:
 Hi

 Some questions about HA/Pacemaker and Load Balancing :

 - is it possible and safe to make working together LVS (based on hearbeat
 software) and Pacemaker/corosync ?
LVS isn't based on heartbeat. You can use pacemaker to manage your VIPs, 
and just leave LVS running in the background. You will need something to 
monitor your backend systems and update LVS tables when they 
fail/recover - ldirectord does a good job of that.

 - is there another known and efficient to provide the load balancing
 functionnality on a Pacemaker cluster ?
Depends what your application is - We use a mixture of LVS and HAProxy.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] DRBD Concept Doubt

2012-05-20 Thread David Coulson
What is your clustering software and what is the configuration? Also 
post your DRBD configuration and the output from cat /proc/drbd during 
each stage of your testing which reproduces the issue.

Maybe post some kernel logs too would be helpful. Simply switching 
pri/sec on DRBD won't cause a node to go outdated, unless you split 
brain the environment.

David

On 5/20/12 9:25 AM, Net Warrior wrote:
 Hi there list!

 I've got a doubt regariding DRBD usage, at the moment I'm trying to
 implement a HA systems with two nodes, is a easy and basic setup
 Two servers, running oracle and LVM.

 I configured once resource lest's say /dev/rootvg/myoracle-device on
 both, this is working fine, I can perform a manual failover as follow

 drdbadm secondary node1
 umount /dev/drbd1

 drdbadm primary node2
 mount /dev/drbd1

 This works fine and I have both serer sincronized, my problem or doubt
 is, when the other node fails,lets say, I power it off, the node2 takes
 primary
 role, I do it manually,  but I have the information Outdated and I loose
 lots of information and I have to wait till node1 comes up to syncronize
 with it.

 So, does DRBD work like that? I thought DRBD was syncronizing in
 backround to the other node to have both nodes with the same
 information, PLEASE,
 correct me if I'm wrong, cuz maybe this solution in not well implemented
 configured or I missunderstood what'd DRBD is for.


 Thanks for your time and support
 Best regards




 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Failover Configuration Question

2012-04-23 Thread David Coulson
Why even use heartbeat then - Just manually ifconfig the interface.

On 4/23/12 7:39 AM, Net Warrior wrote:
 Hi Nikita

 This is the version
 heartbeat-3.0.0-0.7

 My aim is to, if node1 is powered off or losts it's ethernet
 connection,. node2 wont make the failover automatically,  I want to
 make it manually, but could not find how to accomplish that.


 Thanks for your time and support
 Best regards



 2012/4/23, Nikita Michalkomichalko.sys...@a-i-p.com:
 Hi, Net Warrior!


 What version of HA/Pacemaker do you use?
 Did you already RTFM - e.g.
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained
 - or:
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch


 HTH


 Nikita Michalko

 Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior:
 Hi There

 I configured heartbeat to failover an IP address  , if I for example
 shutdown one node, the other takes it's ip address, so far so good, now
 my doubt is if there is a way to configure it not to make the failover
 automatically and have someone run the failover manually, can you provide
 any configuration example please? is this stanza the one that does the
 magic?

 auto_failback on


 Thanks for your time and support
 Best regards
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes

2012-03-22 Thread David Coulson


On 3/22/12 2:43 PM, William Seligman wrote:


 I still haven't solved the problem, but this advice has gotten me further than
 before.

 First, Lars was correct: I did not have execute permissions set on my fence 
 peer
 scripts. (D'oh!) I turned them on, but that did not change anything: 
 cman+clvmd
 still hung on the vgdisplay command if I crashed the peer node.

Does cman think the node is fenced? clvmd will block IO until the node 
is fenced properly.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems