Re: [Linux-HA] Need help in setting up a two node cluster in different subnet
Assuming the two systems can already route between each other, just configure udpu as your transport. Example config: http://lists.corosync.org/pipermail/discuss/2011-October/48.html On Mar 24, 2013, at 11:59 AM, deep saran wrote: Hi all, I am trying to setup a two node cluster where both nodes are in different subnet. I am using corosync/pacemaker for this. I am newbie in this field and have no idea how to configure corosync/pacemaker to work in different subnet. My initial investigation suggest that it is possible using VIPArip along with ripd/quagga but i dont understand how to configure them to work in my networking environment. Can anyone suggest any document or demostrate using an example, i will be really grateful. Please help me if you have any prior experience in this field. Thanks in advance!!! Regards, Deep ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Many Resources Dependent on One Resource Group
On 3/24/13 12:58 PM, Robinson, Eric wrote: In the simplest terms, we currently have resources: A = drbd B = filesystem C = cluster IP D thru J = mysql instances. Resource group G1 consists of resources B through J, in that order, and is dependent on resource A. This fails over fine, but it has the serious disadvantage that if you stop or remove a mysql resource in the middle of the list, all of the ones after it stop too. For example, if you stop G, then H thru J stop as well. We want to change it so that the resource group G1 consists only of resources B C. All of the mysql instances (D thru J) are individually dependent on group G1, but not dependent on each other. That way you can stop or remove a mysql resource without affecting the others. I saw this scenario described in the Pacemaker docs, but I cannot find an example of the syntax. Try adding this to the group: meta ordered=false Or you could take the MySQL instances out of the group and make them each individually dependent on drbd/filesystem with a collocation/order ruleset. Also, you might want to use lrmadmin to set 'max_children' to something more than 4, otherwise it'll only startup 4 MySQL instances at once. Obviously dependent upon how much horsepower you have, but we set it to 32 or 64 (although most of our resources are just IPaddr). ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker
On 12/1/12 5:46 AM, Hermes Flying wrote: Thank you for this! One last thing I need to clear out before digging into your configuration specs etc. Since the pacemaker is a fail-over system rather than a load-balancing system (like Red Hat) as you say, my understanding is that one of my nodes will have the VIP until: 1) Tomcat crashes and can not restart (dead for some reason) -- Pacemaker migrates VIP 2) The network communication with the outside network is cut off. -- Pacemaker migrates VIP If these (2) are valid (are they?) then that means that there is no primary/backup concept using pacemaker (since I will assign to one of my nodes to have the VIP and my installed Load Balancer will distribute the load among my 2 Tomcats) and as a result there can not be a split-brain. In the event of a split brain with Pacemaker, and you don't have any fencing configured, you will end up with your VIP running on both systems. Chances in your configuration it won't be a big deal since your router/firewall/whatever will learn the ARP of one system, so you'll end up routing traffic properly - But it will be unpredictable, and difficult to troubleshoot. Yet you imply that split-brain can occur even with Pacemaker if I don't have fencing properly set. But how? Since it seems to me that Pacemaker does not have a notion of primary/backup. Or you mean something else with fail-over system? For each resource, Pacemaker knows there is a node where it is running, and 'other' nodes where it is not running (but could if the node running it failed). So from a resource perceptive, there is an active node and one or more backups. Additionally you say that the coordination of Pacemaker instances is done via corosync which is over network messages right? So what happens in the event of communication/network failure but only in the communication paths used for corosync coordination and not the communication path with the clients? Hope this question makes sense as I am new in your facilities. Split brain. That's why you need a redundant communications network, plus you need fencing. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker
On 12/1/12 8:21 AM, Hermes Flying wrote: Thanks for your reply. First of all I didn't get if the VIP will migrate if Tomcat or load balancer also fails. It will right? If you configure Pacemaker correctly, yes. Also if I understand this correctly, I can end up with VIP on both nodes if corosync fails due to network failure. And you suggest redundant communication paths to avoid this. But if I understand the problem, if the VIP runs in my linux-1 and pacemaker is somehow via corosync ready to take over on failure from linux-2, if there is a network failure (despite redundant communication paths, unless you guys recommend some specific topology to the people using Pacemaker that you are 100% full proof) how can you detect if the other node is actually crashed or just corosync fails? In this case won't the linux-2 also wakeup to take VIP? That is what fencing is for. If linux-1 goes offline from the perspective of linux-2, linux-2 will attempt to crash/power-cycle/power-off linux-1 to ensure it is really dead. Any resource previously running on linux-1 will be started on linux-2. Usually with a two node config I take two NICs on each box and connect them directly to the other one - Also would work if you had two separate switches you could run each path through. Then I use Linux NIC bonding to provide redundancy and run corosync over the bond interface. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker
On 12/1/12 8:48 AM, Hermes Flying wrote: Great help! Please allow me to trouble you with one last question. If I get this, when I use fencing and the corosync fails then linux-2 will attempt to crash linux-1 and take over. At this point though linux-1 won't try to do anything right? Since it knows it is the primary, I mean. linux-1 will be powered off or crashed, so i think that speaks for itself. Then you say:Any resource previously running on linux-1 will be started on linux-2. Now at this point: By resource you mean only pacemaker and its related modules, right? Because I want Tomcat to be up and running and receiving requests in Linux-2 as well, which will be forwarded by load balancer of linux-1. Is this correct? I mean 'resources managed by pacemaker'. So if you VIP was running on linux-1, and it fails, and linux-2 fences it, the only place the VIP can run is linux-2. linux-1 is totally down. Also in your setup of 2 NICs or 2 switches I assume that the idea is that the probability of split-brain due to network failure is very low right? Because I have read that it is not possible to avoid split-brain without adding a third node. But I may be misunderstanding this A third node will eliminate split brain by definition, as quorum will only be obtained if a minimum of two nodes are available. If you have a diverse network configuration and good change management, you're probably not going to experience a split brain unless you have a substantial environment failure that will probably impact your client ability to access anything. Since you are not running shared storage, you're not going to experience data loss which is typically the biggest concern with split brain. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker
Already told you. If you are running two-node, make sure your fencing works and you have reliable connectivity between nodes. if that isn't good enough, add a third node. On 12/1/12 8:58 AM, Hermes Flying wrote: Actually each Tomcat uses a back-end database that has the notion of primary/backup. I am trying to figure out if by using Pacemaker facilities I can avoid splitbrain in the database as well. So far from what you described I seem to get away with it meaning that by fencing, linux-1 will stop so the secondary database in lunux-2 will become primary. Am I on the right track here? If you have any recommendations for my setup (2 linux running: 2 LB/2Tomcat/2Databases) please let me know! Thank you for your time! *From:* David Coulson da...@davidcoulson.net *To:* Hermes Flying flyingher...@yahoo.com *Cc:* General Linux-HA mailing list linux-ha@lists.linux-ha.org; Digimer li...@alteeve.ca *Sent:* Saturday, December 1, 2012 3:53 PM *Subject:* Re: [Linux-HA] Some help on understanding how HA issues are addressed by pacemaker On 12/1/12 8:48 AM, Hermes Flying wrote: Great help! Please allow me to trouble you with one last question. If I get this, when I use fencing and the corosync fails then linux-2 will attempt to crash linux-1 and take over. At this point though linux-1 won't try to do anything right? Since it knows it is the primary, I mean. linux-1 will be powered off or crashed, so i think that speaks for itself. Then you say:Any resource previously running on linux-1 will be started on linux-2. Now at this point: By resource you mean only pacemaker and its related modules, right? Because I want Tomcat to be up and running and receiving requests in Linux-2 as well, which will be forwarded by load balancer of linux-1. Is this correct? I mean 'resources managed by pacemaker'. So if you VIP was running on linux-1, and it fails, and linux-2 fences it, the only place the VIP can run is linux-2. linux-1 is totally down. Also in your setup of 2 NICs or 2 switches I assume that the idea is that the probability of split-brain due to network failure is very low right? Because I have read that it is not possible to avoid split-brain without adding a third node. But I may be misunderstanding this A third node will eliminate split brain by definition, as quorum will only be obtained if a minimum of two nodes are available. If you have a diverse network configuration and good change management, you're probably not going to experience a split brain unless you have a substantial environment failure that will probably impact your client ability to access anything. Since you are not running shared storage, you're not going to experience data loss which is typically the biggest concern with split brain. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Active/Active
From your post on the drbd list it looks like your storage is messed up. Post the output of crm_mon -1fr from both nodes. Guessing your drbd resource didn't promote to master on one node because your drbd cluster is out of sync. Sent from my iPad On Aug 2, 2012, at 4:22 AM, Yount, William D yount.will...@menloworldwide.com wrote: I am using CentOS 6.2, mainly because that is that is what the Clusters from Scratch documentation uses. I am using GFS2. Yeah, I had some issues with the cman/corosync/pacemaker startup order at first. But CentOS uses chkconfig which allowed me to turn corosync off, cman on and pacemaker on. Ever since, I haven't had an issue with the cluster coming up on its own. Previously I had to stop coryosync, start cman and then start pacemaker. I do seem to have some issues with the cluster recognizing when a resource is down but I believe that may be tied to pacemaker which I am using for fencing. There is something wrong with my cib.xml configuration. All the resources seem to be docked at Node2. If I cut Node2 off or unplug a network cable from Node2 than my cluster goes down. Even my cloned IP address goes down. -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Bruno MACADRE Sent: Thursday, August 02, 2012 2:38 AM To: linux-ha@lists.linux-ha.org Subject: Re: [Linux-HA] Active/Active Hi, I'm working on a cluster of the same type as yours and I still ask myself some questions. Every feedback will be appreciated : - What OS did you use ? I do a lot of work on Ubuntu 12.04 server but I'm not satisfied by the cluster stability mainly with OCFS2 - Do you use clustered filesystem (like OCFS2 or GFS2) in you Dual Primary DRBD ? Did you have special concerns ? I had a lot of concern for use OCFS2 on Ubuntu server 'cause it need a cman stack above corosync which made entire cluster unstable. For example : When I do 'node standby node1' it kill pacemaker whithout the possibility to comme back online at all !!! - Have you been encountering some obstacles when NFS resources switch during high use ? I gladly give a look at your cib.xml file if you wish. Regards, Bruno Le 01/08/2012 11:22, Yount, William D a écrit : I was wondering if someone could give a look over my cib.xml file and see if I need to change anything. I am attempting to create an Active/Active cluster offering up a DRBD volume for NFS share. Everything works fine as it is, but I would like someone more knowledgeable to look it over at your leisure. Thank you, william ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Bruno MACADRE --- Ingénieur Systèmes et Réseau | Systems and Network Engineer Département Informatique | Department of computer science Responsable Réseau et Téléphonie | Telecom and Network Manager Université de Rouen | University of Rouen --- Coordonnées / Contact : Université de Rouen Faculté des Sciences et Techniques - Madrillet Avenue de l'Université - BP12 76801 St Etienne du Rouvray CEDEX FRANCE Tél : +33 (0)2-32-95-51-86 Fax : +33 (0)2-32-95-51-87 --- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Question about Pacemaker Load-balancing
On 6/4/12 7:23 AM, alain.mou...@bull.net wrote: Hi Some questions about HA/Pacemaker and Load Balancing : - is it possible and safe to make working together LVS (based on hearbeat software) and Pacemaker/corosync ? LVS isn't based on heartbeat. You can use pacemaker to manage your VIPs, and just leave LVS running in the background. You will need something to monitor your backend systems and update LVS tables when they fail/recover - ldirectord does a good job of that. - is there another known and efficient to provide the load balancing functionnality on a Pacemaker cluster ? Depends what your application is - We use a mixture of LVS and HAProxy. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] DRBD Concept Doubt
What is your clustering software and what is the configuration? Also post your DRBD configuration and the output from cat /proc/drbd during each stage of your testing which reproduces the issue. Maybe post some kernel logs too would be helpful. Simply switching pri/sec on DRBD won't cause a node to go outdated, unless you split brain the environment. David On 5/20/12 9:25 AM, Net Warrior wrote: Hi there list! I've got a doubt regariding DRBD usage, at the moment I'm trying to implement a HA systems with two nodes, is a easy and basic setup Two servers, running oracle and LVM. I configured once resource lest's say /dev/rootvg/myoracle-device on both, this is working fine, I can perform a manual failover as follow drdbadm secondary node1 umount /dev/drbd1 drdbadm primary node2 mount /dev/drbd1 This works fine and I have both serer sincronized, my problem or doubt is, when the other node fails,lets say, I power it off, the node2 takes primary role, I do it manually, but I have the information Outdated and I loose lots of information and I have to wait till node1 comes up to syncronize with it. So, does DRBD work like that? I thought DRBD was syncronizing in backround to the other node to have both nodes with the same information, PLEASE, correct me if I'm wrong, cuz maybe this solution in not well implemented configured or I missunderstood what'd DRBD is for. Thanks for your time and support Best regards ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Failover Configuration Question
Why even use heartbeat then - Just manually ifconfig the interface. On 4/23/12 7:39 AM, Net Warrior wrote: Hi Nikita This is the version heartbeat-3.0.0-0.7 My aim is to, if node1 is powered off or losts it's ethernet connection,. node2 wont make the failover automatically, I want to make it manually, but could not find how to accomplish that. Thanks for your time and support Best regards 2012/4/23, Nikita Michalkomichalko.sys...@a-i-p.com: Hi, Net Warrior! What version of HA/Pacemaker do you use? Did you already RTFM - e.g. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained - or: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch HTH Nikita Michalko Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior: Hi There I configured heartbeat to failover an IP address , if I for example shutdown one node, the other takes it's ip address, so far so good, now my doubt is if there is a way to configure it not to make the failover automatically and have someone run the failover manually, can you provide any configuration example please? is this stanza the one that does the magic? auto_failback on Thanks for your time and support Best regards ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] clvm/dlm/gfs2 hangs if a node crashes
On 3/22/12 2:43 PM, William Seligman wrote: I still haven't solved the problem, but this advice has gotten me further than before. First, Lars was correct: I did not have execute permissions set on my fence peer scripts. (D'oh!) I turned them on, but that did not change anything: cman+clvmd still hung on the vgdisplay command if I crashed the peer node. Does cman think the node is fenced? clvmd will block IO until the node is fenced properly. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems