Re: [Linux-HA] Heartbeat packages for Redhat-7
Thank you Lars for clarifying. I will see what I can use in our environment. Thank you again, Yogi. On Fri, Apr 3, 2015 at 11:03 PM, Lars Ellenberg lars.ellenb...@linbit.com wrote: On Wed, Apr 01, 2015 at 12:16:38PM +0100, Yogendramummaneni Prasad wrote: Hello, At the moment, we are using heartbeat on RedHat-5.8 using the below packages: [root@p118278vaps2011 ~]# rpm -qa|grep heartbeat heartbeat-2.1.4-11.el5 heartbeat-stonith-2.1.4-11.el5 heartbeat-pils-2.1.4-11.el5 Now we are planning to upgrade the OS to Redhat-7.0. I could not find the same heartbeat packages for RedHat-7.0 on the internet. Could you please confirm if the heartbeat packages are available for RedHat-7.0 Those versions are almost seven years old. You can use heartbeat 3.0.6 (if you only use haresources mode). If you use crm mode, you need to realize that the crm component has been split off into its own project years ago: Pacemaker. For CRM mode, if you want to stick with heartbeat, you use heartbeat 3.0.6 and Pacemaker 1.1.12 (with LINBIT patches), or Pacemaker 1.1.13 (soon to be released, including those patches). If you don't have any particular reason to keep using heartbeat, the recommended cluster stack is Corosync + Pacemaker, which is what you get with the RHEL 7 native HA cluster. For more about Pacemaker, visit clusterlabs.org, subscribe to us...@clusterlabs.org, or join on freenode #clusterlabs -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list is closing down. Please subscribe to us...@clusterlabs.org instead. http://clusterlabs.org/mailman/listinfo/users ___ Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha
Re: [Linux-HA] Heartbeat packages for Redhat-7
On 2015-04-03 17:03, Lars Ellenberg wrote: You can use heartbeat 3.0.6 (if you only use haresources mode). You can google for ticket # but basically epel heartbeat maintainer replied to my rfa with I don't use heartbeat anymore so no. I meant to post that here but forgot. So there is no heartbeat rpm for el7 in the usual repos. Does clusterlabs have one? Dimitri ___ Linux-HA mailing list is closing down. Please subscribe to us...@clusterlabs.org instead. http://clusterlabs.org/mailman/listinfo/users ___ Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha
Re: [Linux-HA] Heartbeat packages for Redhat-7
On Wed, Apr 01, 2015 at 12:16:38PM +0100, Yogendramummaneni Prasad wrote: Hello, At the moment, we are using heartbeat on RedHat-5.8 using the below packages: [root@p118278vaps2011 ~]# rpm -qa|grep heartbeat heartbeat-2.1.4-11.el5 heartbeat-stonith-2.1.4-11.el5 heartbeat-pils-2.1.4-11.el5 Now we are planning to upgrade the OS to Redhat-7.0. I could not find the same heartbeat packages for RedHat-7.0 on the internet. Could you please confirm if the heartbeat packages are available for RedHat-7.0 Those versions are almost seven years old. You can use heartbeat 3.0.6 (if you only use haresources mode). If you use crm mode, you need to realize that the crm component has been split off into its own project years ago: Pacemaker. For CRM mode, if you want to stick with heartbeat, you use heartbeat 3.0.6 and Pacemaker 1.1.12 (with LINBIT patches), or Pacemaker 1.1.13 (soon to be released, including those patches). If you don't have any particular reason to keep using heartbeat, the recommended cluster stack is Corosync + Pacemaker, which is what you get with the RHEL 7 native HA cluster. For more about Pacemaker, visit clusterlabs.org, subscribe to us...@clusterlabs.org, or join on freenode #clusterlabs -- : Lars Ellenberg : http://www.LINBIT.com | Your Way to High Availability : DRBD, Linux-HA and Pacemaker support and consulting DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat packages for Redhat-7
On 03/04/15 06:03 PM, Lars Ellenberg wrote: On Wed, Apr 01, 2015 at 12:16:38PM +0100, Yogendramummaneni Prasad wrote: Hello, At the moment, we are using heartbeat on RedHat-5.8 using the below packages: [root@p118278vaps2011 ~]# rpm -qa|grep heartbeat heartbeat-2.1.4-11.el5 heartbeat-stonith-2.1.4-11.el5 heartbeat-pils-2.1.4-11.el5 Now we are planning to upgrade the OS to Redhat-7.0. I could not find the same heartbeat packages for RedHat-7.0 on the internet. Could you please confirm if the heartbeat packages are available for RedHat-7.0 Those versions are almost seven years old. You can use heartbeat 3.0.6 (if you only use haresources mode). If you use crm mode, you need to realize that the crm component has been split off into its own project years ago: Pacemaker. For CRM mode, if you want to stick with heartbeat, you use heartbeat 3.0.6 and Pacemaker 1.1.12 (with LINBIT patches), or Pacemaker 1.1.13 (soon to be released, including those patches). If you don't have any particular reason to keep using heartbeat, the recommended cluster stack is Corosync + Pacemaker, which is what you get with the RHEL 7 native HA cluster. For more about Pacemaker, visit clusterlabs.org, subscribe to us...@clusterlabs.org, or join on freenode #clusterlabs To expand on/provide background to Lars' answer: https://alteeve.ca/w/History_of_HA_Clustering Also, please subscribe to Clusterlabs mailing list, as is show in Lars' footer. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat
Hi Dimitri, yes there are 4 pairs, but they are all active. When a node fails, the other one in the pair just takes everything over. A different HA is not an option, it has to be heartbeat. I noticed something called ethmonitor, I can probably notice an IB connection with it (it has ipoib on it) On 01/20/2015 12:51 PM, Dimitri Maziuk wrote: On 01/20/2015 01:34 PM, Ron Croonenberg wrote: Hello, I have an ether net connection that connects all hosts in a cluster and the nodes also have an IB connection. I want the failover host to take over when an IB connection goes down on a host. Is there an example for how to do this? (I am using ipmi for shutting down hosts etc). A cluster I am using has 8 nodes and want to do fail over in pairs of two. in the ha.cf file do I mention all the hosts or just the host and it's fail over, per pair? Do you have 4 separate active-passive pairs or a cluster of 8 nodes? If it's the latter, I think you want pacemaker, not heartbeat. Dunno what pacemaker might have for monitoring an IB connection, with heartbeat R1 I'd do something like grep for LinkUp in the output of ibstat. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat
On 01/20/2015 01:34 PM, Ron Croonenberg wrote: Hello, I have an ether net connection that connects all hosts in a cluster and the nodes also have an IB connection. I want the failover host to take over when an IB connection goes down on a host. Is there an example for how to do this? (I am using ipmi for shutting down hosts etc). A cluster I am using has 8 nodes and want to do fail over in pairs of two. in the ha.cf file do I mention all the hosts or just the host and it's fail over, per pair? Do you have 4 separate active-passive pairs or a cluster of 8 nodes? If it's the latter, I think you want pacemaker, not heartbeat. Dunno what pacemaker might have for monitoring an IB connection, with heartbeat R1 I'd do something like grep for LinkUp in the output of ibstat. -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat in Amazon VMs doest not create virtaul ip address
- Original Message - Hi, I installed on HeartBeat,Centos 6.5 on 2 Amazon EC2 machinesthis is the If you have an option, I'd strongly recommend using the Pacemaker+CMAN stack in rhel 6.5. Red Hat began supporting pacemaker in 6.5, so it should be available to you. -- Vossel version: [root@ip-10-0-2-68 ha.d]# rpm -qa | grep heartbeat heartbeat-libs-3.0.4-2.el6.x86_64 heartbeat-3.0.4-2.el6.x86_64 heartbeat-devel-3.0.4-2.el6.x86_64 the floating IP is [root@ip-10-0-2-68 ha.d]# cat haresources ip-10-0-2-68 10.0.2.70 but it is not created on any machine, it does not matter where I do the takeover or standby commands what am I missing? is this even possible ? these are my setting in ha.cf logfacility local0 ucast eth0 10.0.2.69 auto_failback on node ip-10-0-2-68 ip-10-0-2-69 ping 10.0.2.1 use_logd yes logfacility local0 ucast eth0 10.0.2.68 auto_failback on node ip-10-0-2-68 ip-10-0-2-69 ping 10.0.2.1 use_logd yes these is the output of the route command [root@ip-10-0-2-68 ha.d]# route -n Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 0.0.0.0 10.0.2.1 0.0.0.0 UG 0 0 0 eth0 [root@ip-10-0-2-68 ha.d]# this is how the interfaces eth0 are set up on machine 1:[root@ip-10-0-2-68 ha.d]# ifconfig eth0 Link encap:Ethernet HWaddr 12:23:49:EF:3A:53 inet addr:10.0.2.68 Bcast:10.0.2.255 Mask:255.255.255.0 inet6 addr: fe80::1023:49ff:feef:3a53/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:9001 Metric:1 RX packets:269823 errors:0 dropped:0 overruns:0 frame:0 TX packets:192305 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:167802149 (160.0 MiB) TX bytes:48341828 (46.1 MiB) Interrupt:247 these are the logs showing everything going on fine but when doing ifconfig the interface is not there: Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: WARN: node ip-10-0-2-69: is dead Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Comm_now_up(): updating status to active Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Local status now set to: 'active' Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: WARN: No STONITH device configured. Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: WARN: Shared disks are not protected. Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Resources being acquired from ip-10-0-2-69. Nov 11 21:37:39 ip-10-0-2-68 mach_down(default)[14769]: info: /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: mach_down takeover complete. Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Initial resource acquisition complete (mach_down) Nov 11 21:37:39 ip-10-0-2-68 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[14845]: INFO: Resource is stopped Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14701]: [14701]: info: Local Resource acquisition completed. Nov 11 21:37:40 ip-10-0-2-68 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[14958]: INFO: Resource is stopped Nov 11 21:37:40 ip-10-0-2-68 IPaddr(IPaddr_10.0.2.70)[15057]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.0.2.70 eth0 10.0.2.70 auto not_used not_used Nov 11 21:37:40 ip-10-0-2-68 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[15064]: INFO: Success Nov 11 21:37:49 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Local Resource acquisition completed. (none) Nov 11 21:37:49 ip-10-0-2-68 heartbeat[14681]: [14681]: info: local resource transition completed. Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: WARN: node ip-10-0-2-68: is dead Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: info: Comm_now_up(): updating status to active Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: info: Local status now set to: 'active' Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: WARN: No STONITH device configured. Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: WARN: Shared disks are not protected. Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: info: Resources being acquired from ip-10-0-2-68. Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18360]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys ip-10-0-2-69] to acquire. Nov 11 21:38:17 ip-10-0-2-69 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[18441]: INFO: Resource is stopped Nov 11 21:38:17 ip-10-0-2-69 IPaddr(IPaddr_10.0.2.70)[18537]: INFO: /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p /var/run/resource-agents/send_arp-10.0.2.70 eth0 10.0.2.70 auto not_used not_used Nov 11 21:38:17 ip-10-0-2-69
Re: [Linux-HA] heartbeat 3.0.3 crashes if there are networking/multicast issues (ERROR: lowseq cannnot be greater than ackseq)
On Thu, Jun 26, 2014 at 01:30:01PM +0200, Lars Ellenberg wrote: On Tue, Jun 24, 2014 at 11:20:48PM +0300, Pasi Kärkkäinen wrote: Hello! I've been seeing heartbeat cluster problems in Linux-based Vyatta and more recent VyOS networking/router appliances. These are currently based on Debian Squeeze, and thus are using: Package: heartbeat Version: 1:3.0.3-2 Please use 3.0.5: http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/37f57a36a2dd.tar.bz2 Do you think v3.0.5 fixes the issue of heartbeat process crashing? This patch perhaps? http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/3e51db646a21 Thanks, -- Pasi VyOS bug report: http://bugzilla.vyos.net/show_bug.cgi?id=244 The problem is that when there are (unexpected) networking problems causing multicast issues, which cause problems in the inter-cluster communications, the heartbeat processes will die on the cluster nodes, which is bad, right? I assume heartbeat should never die, especially not because of temporary networking issues.. I've also seen heartbeat dying because of temporary network maintenance breaks.. Basicly first I'm seeing this kind of messages: Jun 23 17:55:02 vyos03 heartbeat: [4119]: WARN: node vyos01: is dead Jun 23 17:59:23 vyos03 heartbeat: [4119]: CRIT: Cluster node vyos01 returning after partition. Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Deadtime value may be too small. Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Late heartbeat: Node vyos01: interval 273580 ms Jun 23 17:59:23 vyos03 harc[4961]: info: Running /etc/ha.d//rc.d/status status Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Releasing resource group: vyos01 IPaddr2-vyatta::10.0.0.10/24/eth1 Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Running /etc/ha.d/resource.d/IPaddr2-vyatta 10.0.0.10/24/eth1 stop Jun 23 17:59:26 vyos03 heartbeat: [4119]: WARN: 1 lost packet(s) for [vyos01] [421:423] Jun 23 17:59:39 vyos03 heartbeat: [4119]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jun 23 17:59:40 vyos03 harc[5102]: info: Running /etc/ha.d//rc.d/status status Which seem normal in the case of networking problem.. But then later: Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (494 messages in queue) Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (495 messages in queue) Jun 23 19:31:23 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (496 messages in queue) Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (497 messages in queue) Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (498 messages in queue) Jun 23 19:31:25 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (499 messages in queue) Jun 23 19:31:26 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (500 messages in queue) Jun 23 19:31:42 vyos03 heartbeat: last message repeated 25 times The hist queue size keeps increasing, and when it gets to 500 messages bad things start happening.. Jun 23 19:31:43 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (500 messages in queue) Jun 23 19:31:49 vyos03 heartbeat: last message repeated 9 times Jun 23 19:31:49 vyos03 heartbeat: [10921]: ERROR: lowseq cannnot be greater than ackseq Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown: Master Control process died. Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10921 with SIGTERM Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10924 with SIGTERM Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10925 with SIGTERM Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown(MCP dead): Killing ourselves. At this point clustering has failed, because the heartbeat services/processes aren't running anymore.. Has anyone else seen this? It has been fixed years ago ... It seems the bug gets triggered at 500 messages in the hist queue, and then I always see the ERROR: lowseq cannnot be greater than ackseq and then heartbeat dies.. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat 3.0.3 crashes if there are networking/multicast issues (ERROR: lowseq cannnot be greater than ackseq)
On Tue, Jun 24, 2014 at 11:20:48PM +0300, Pasi Kärkkäinen wrote: Hello! I've been seeing heartbeat cluster problems in Linux-based Vyatta and more recent VyOS networking/router appliances. These are currently based on Debian Squeeze, and thus are using: Package: heartbeat Version: 1:3.0.3-2 Please use 3.0.5: http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/37f57a36a2dd.tar.bz2 VyOS bug report: http://bugzilla.vyos.net/show_bug.cgi?id=244 The problem is that when there are (unexpected) networking problems causing multicast issues, which cause problems in the inter-cluster communications, the heartbeat processes will die on the cluster nodes, which is bad, right? I assume heartbeat should never die, especially not because of temporary networking issues.. I've also seen heartbeat dying because of temporary network maintenance breaks.. Basicly first I'm seeing this kind of messages: Jun 23 17:55:02 vyos03 heartbeat: [4119]: WARN: node vyos01: is dead Jun 23 17:59:23 vyos03 heartbeat: [4119]: CRIT: Cluster node vyos01 returning after partition. Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Deadtime value may be too small. Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Late heartbeat: Node vyos01: interval 273580 ms Jun 23 17:59:23 vyos03 harc[4961]: info: Running /etc/ha.d//rc.d/status status Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Releasing resource group: vyos01 IPaddr2-vyatta::10.0.0.10/24/eth1 Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Running /etc/ha.d/resource.d/IPaddr2-vyatta 10.0.0.10/24/eth1 stop Jun 23 17:59:26 vyos03 heartbeat: [4119]: WARN: 1 lost packet(s) for [vyos01] [421:423] Jun 23 17:59:39 vyos03 heartbeat: [4119]: WARN: Logging daemon is disabled --enabling logging daemon is recommended Jun 23 17:59:40 vyos03 harc[5102]: info: Running /etc/ha.d//rc.d/status status Which seem normal in the case of networking problem.. But then later: Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (494 messages in queue) Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (495 messages in queue) Jun 23 19:31:23 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (496 messages in queue) Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (497 messages in queue) Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (498 messages in queue) Jun 23 19:31:25 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (499 messages in queue) Jun 23 19:31:26 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (500 messages in queue) Jun 23 19:31:42 vyos03 heartbeat: last message repeated 25 times The hist queue size keeps increasing, and when it gets to 500 messages bad things start happening.. Jun 23 19:31:43 vyos03 heartbeat: [10921]: ERROR: Message hist queue is filling up (500 messages in queue) Jun 23 19:31:49 vyos03 heartbeat: last message repeated 9 times Jun 23 19:31:49 vyos03 heartbeat: [10921]: ERROR: lowseq cannnot be greater than ackseq Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown: Master Control process died. Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10921 with SIGTERM Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10924 with SIGTERM Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10925 with SIGTERM Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown(MCP dead): Killing ourselves. At this point clustering has failed, because the heartbeat services/processes aren't running anymore.. Has anyone else seen this? It has been fixed years ago ... It seems the bug gets triggered at 500 messages in the hist queue, and then I always see the ERROR: lowseq cannnot be greater than ackseq and then heartbeat dies.. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Supported Version
You should email Linbit (http://linbit.com) as they're the company that still supports the heartbeat package. This said, if you are starting a new project, I strongly urge you to consider corosync + pacemaker. The heartbeat project has not been actively developed in quite some time, and there are no plans to restart development in the future. Back in the day, heartbeat was one separate platform and Red Hat's RHCS was another. Over the years, this caused confusion and a lot of reinventing the wheel, so the two communities started work on merging into one common platform. The result is corosync + pacemaker, which is what all major developers are supporting from here on in. If you're curious about the more detailed story, I've got a (still in progress) history here: https://alteeve.ca/w/History_of_HA_Clustering Again, it's not complete, but it does give a fairly good background on why heartbeat is not recommended anymore. It *is* still supported by Linbit though, so I'm not saying not to use it. Just consider the future. :) digimer On 02/06/14 11:07 AM, Venkata G Thota wrote: Hello, In our project we had the heartbeat cluster with version heartbeat-2.1.4-0.24.9. Is it the supported version ? Kindly assist how to get support for heartbeat cluster issues. Regards Venkata G Thota DLF IT PARK GTS Services Delivery - India Chennai UNIX Administrator India Phone: +91 44 434 25397 Mobile: +91 99625 48884 e-mail: venkt...@in.ibm.com Red Hat Certified Engineer In happy moments, praise God. In difficult moments, seek God. In quiet moments, worship God. In painful moments, trust God. In every moment, thank God. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Supported Version
On 2014-06-02T20:37:59, Venkata G Thota venkt...@in.ibm.com wrote: Hello, In our project we had the heartbeat cluster with version heartbeat-2.1.4-0.24.9. Is it the supported version ? Kindly assist how to get support for heartbeat cluster issues. Regards That looks like a fairly old heartbeat version from SUSE Linux Enterprise Server 10 SP4. SLES 10 is out of general support since July 2013, but extended support (https://www.suse.com/support/lc-faq.html#2) or LTSS is still available. Alternatively, the best option you'd have is to upgrade to SLES + HA 11 SP3. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Supported Version
On 2014-06-02T12:04:23, Digimer li...@alteeve.ca wrote: You should email Linbit (http://linbit.com) as they're the company that still supports the heartbeat package. For completeness, I doubt Linbit will support this version, since 2.1.4 from SLES 10 contains a number of backports from the pacemaker 0.7/1.0 series. While source code is obviously available, I'd not suggest to inflict this on Linbit ;-) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Supported Version
On 02/06/14 06:30 PM, Lars Marowsky-Bree wrote: On 2014-06-02T12:04:23, Digimer li...@alteeve.ca wrote: You should email Linbit (http://linbit.com) as they're the company that still supports the heartbeat package. For completeness, I doubt Linbit will support this version, since 2.1.4 from SLES 10 contains a number of backports from the pacemaker 0.7/1.0 series. While source code is obviously available, I'd not suggest to inflict this on Linbit ;-) Regards, Lars I didn't think they would support it, but I wanted to leave that for Linbit to say (I've been wrong enough times before...) -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Supported Version
On Mon, Jun 02, 2014 at 06:31:09PM -0400, Digimer wrote: On 02/06/14 06:30 PM, Lars Marowsky-Bree wrote: On 2014-06-02T12:04:23, Digimer li...@alteeve.ca wrote: You should email Linbit (http://linbit.com) as they're the company that still supports the heartbeat package. For completeness, I doubt Linbit will support this version, since 2.1.4 from SLES 10 contains a number of backports from the pacemaker 0.7/1.0 series. While source code is obviously available, I'd not suggest to inflict this on Linbit ;-) Regards, Lars I didn't think they would support it, but I wanted to leave that for Linbit to say (I've been wrong enough times before...) Thanks, both of you ;-) Yes, we maintain heartbeat. We still occasionally bugfix (or even enhance) it if necessary. That does not mean we support each and every legacy version of it that happend to be bundled with some distribution at some point. That's probably a matter of time and money, and political constraints ... If you really have to stay with whatever platform you have right now, but need it to be supported for a very long term: as that is a SuSE platform, ask SuSE what they can offer. If you are basically happy with what you have, but need just this one snag fixed, describe your problem, and maybe someone will be able to tell you what to do (but that won't go without metioning in every second sentence that you should probably upgrade). If you are about to set up a new cluster, go with current software. Heartbeat itself is currently 3.something, in fact pending a new release tag since ages... Appart from few but important bugfixes and some minor improvements to the inner workings of it, the main difference from heartbeat 2. to heartbeat 3 is, that the crm (cluster resource manager) part of it was split out (years ago) and became Pacemaker. Depending on what you have now, what you are used to do, what you feel most comfortable with, and what you want to achieve, I see several options. * you are used to haresources, and in fact want to still use that -= use current heartbeat 3.x packages, and keep doing whatever you did until now * you have been using crm with heartbeat 2.x or at least you now want to start using it -= you should upgrade to Pacemaker, which is just the natural evolution of the heartbeat crm component, even with the same lead developer still. *several years* of evolution and improvements, in fact. You have further options now: * Keep the cluster communication and membership layer: heartbeat (3.x) + pacemaker * Change the cluster communication and membership layer: corosync (2.x) + pacemaker (and more, like cman and corosync 1.x...) Recommendation for new clusters: go with pacemaker (1.1.12 will be release soon) and corosync (2.3.3 is it now?). That's also about what you will get with current distributions (rhel7, sles12). (Though we at Linbit are still happy with heartbeat + pacemaker as well). -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Supported Version
On 02/06/14 07:05 PM, Lars Ellenberg wrote: (Though we at Linbit are still happy with heartbeat + pacemaker as well). Heathens! HEATHENS!! ;) -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat failover
Hello Arnold, yes, I recently found out that the sync-rate was to high for our old firewall. That are two datacenters, and all traffic is routed through this firewall. I don't know exactly why, this is the concept somehow. Do you know how to force another ip address on the other side? In heartbeat I was able to say, that the clusterip is another one as on the other node. In corosync/pacemaker I can't find such an example. Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Arnold Krille Gesendet: Samstag, 25. Januar 2014 01:46 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover On Thu, 23 Jan 2014 16:45:04 + bjoern.bec...@easycash.de wrote: Uhhh..I got the same configuration as the example config you sent me now. But I cause high cpu load on our cisco asa firewall.. I guess this traffic is not normal? snip When you want your cluster to repair failures _fast_, the components have to sync their state _fast_. So they have to talk a lot, not in terms of megabytes but in terms of small packages with low latency in submission. So, yes that traffic is normal. Why is there a firewall between your nodes on the network where the cluster traffic happens? Have fun, Arnold ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat failover
Hello, I running corosync with success now. But I got a problem, because I got two different subnet and I don't know which ClusterIP I have to use. I got 10.128.61.0 and 10.128.62.0, so a ClusterIP like 10.128.61.61 will not routed in 10.128.62.0. How I can use different ClusterIP's per side? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Becker, Björn Gesendet: Donnerstag, 23. Januar 2014 17:45 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover Uhhh..I got the same configuration as the example config you sent me now. But I cause high cpu load on our cisco asa firewall.. I guess this traffic is not normal? root@node01:/etc/corosync# tcpdump dst port 5405 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 17:41:06.093140 IP node01.5405 node02.5405: UDP, length 70 17:41:06.097327 IP node02.5405 node01.5405: UDP, length 70 17:41:06.113418 IP node01.52580 node02.5405: UDP, length 82 17:41:06.286517 IP node01.5405 node02.5405: UDP, length 70 17:41:06.291095 IP node02.5405 node01.5405: UDP, length 70 17:41:06.480221 IP node01.5405 node02.5405: UDP, length 70 17:41:06.484520 IP node02.5405 node01.5405: UDP, length 70 17:41:06.500608 IP node01.52580 node02.5405: UDP, length 82 17:41:06.673721 IP node01.5405 node02.5405: UDP, length 70 17:41:06.678654 IP node02.5405 node01.5405: UDP, length 70 17:41:06.867757 IP node01.5405 node02.5405: UDP, length 70 17:41:06.872492 IP node02.5405 node01.5405: UDP, length 70 17:41:06.888576 IP node01.52580 node02.5405: UDP, length 82 17:41:07.061664 IP node01.5405 node02.5405: UDP, length 70 17:41:07.066304 IP node02.5405 node01.5405: UDP, length 70 17:41:07.255409 IP node01.5405 node02.5405: UDP, length 70 17:41:07.260512 IP node02.5405 node01.5405: UDP, length 70 17:41:07.275601 IP node01.52580 node02.5405: UDP, length 82 Mit freundlichen Grüßen / Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Becker, Björn Gesendet: Donnerstag, 23. Januar 2014 17:28 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover Hi Lukas, thank you. Well, I've to wait for some firewall changes for 5405 UDP. But I'm not sure if it's correct what I'm doing. Node1: interface { member { memberaddr: 10.128.61.60 # node 1 } member { memberaddr: 10.128.62.60 # node 2 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.61.0 mcastport: 5405 } transport: udpu Node2: interface { member { memberaddr: 10.128.61.60 } member { memberaddr: 10.128.62.60 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.62.0 mcastport: 5405 } transport: udpu Something seems to be wrong defenitly. My firewall was on very high load... Mit freundlichen Grüßen / Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar Gesendet: Donnerstag, 23. Januar 2014 16:54 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover Hi Björn Here ist an example how you can setup corosync to use Unicast UDP: https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu The important parts are transport: udpu and that you need to configure every member manually using memberaddr: 10.16.35.115. Best regards Lukas On Thu, 23 Jan 2014 13:36:22 + bjoern.bec...@easycash.de wrote: Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown
Re: [Linux-HA] heartbeat failover
Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart development in the future. Everything (even RH) is standardizing on the corosync+pacemaker stack, so it has the most vibrant community as well. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat failover
Hi Björn Here ist an example how you can setup corosync to use Unicast UDP: https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu The important parts are transport: udpu and that you need to configure every member manually using memberaddr: 10.16.35.115. Best regards Lukas On Thu, 23 Jan 2014 13:36:22 + bjoern.bec...@easycash.de wrote: Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart development in the future. Everything (even RH) is standardizing on the corosync+pacemaker stack, so it has the most vibrant community as well. -- Adfinis SyGroup AG Lukas Grossar, System Engineer Keltenstrasse 98 | CH-3018 Bern Tel. 031 550 31 11 | Direkt 031 550 31 06 signature.asc Description: PGP signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat failover
Hi Lukas, thank you. Well, I've to wait for some firewall changes for 5405 UDP. But I'm not sure if it's correct what I'm doing. Node1: interface { member { memberaddr: 10.128.61.60 # node 1 } member { memberaddr: 10.128.62.60 # node 2 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.61.0 mcastport: 5405 } transport: udpu Node2: interface { member { memberaddr: 10.128.61.60 } member { memberaddr: 10.128.62.60 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.62.0 mcastport: 5405 } transport: udpu Something seems to be wrong defenitly. My firewall was on very high load... Mit freundlichen Grüßen / Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar Gesendet: Donnerstag, 23. Januar 2014 16:54 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover Hi Björn Here ist an example how you can setup corosync to use Unicast UDP: https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu The important parts are transport: udpu and that you need to configure every member manually using memberaddr: 10.16.35.115. Best regards Lukas On Thu, 23 Jan 2014 13:36:22 + bjoern.bec...@easycash.de wrote: Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart development in the future. Everything (even RH) is standardizing on the corosync+pacemaker stack, so it has the most vibrant community as well. -- Adfinis SyGroup AG Lukas Grossar, System Engineer Keltenstrasse 98 | CH-3018 Bern Tel. 031 550 31 11 | Direkt 031 550 31 06 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat failover
Uhhh..I got the same configuration as the example config you sent me now. But I cause high cpu load on our cisco asa firewall.. I guess this traffic is not normal? root@node01:/etc/corosync# tcpdump dst port 5405 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 17:41:06.093140 IP node01.5405 node02.5405: UDP, length 70 17:41:06.097327 IP node02.5405 node01.5405: UDP, length 70 17:41:06.113418 IP node01.52580 node02.5405: UDP, length 82 17:41:06.286517 IP node01.5405 node02.5405: UDP, length 70 17:41:06.291095 IP node02.5405 node01.5405: UDP, length 70 17:41:06.480221 IP node01.5405 node02.5405: UDP, length 70 17:41:06.484520 IP node02.5405 node01.5405: UDP, length 70 17:41:06.500608 IP node01.52580 node02.5405: UDP, length 82 17:41:06.673721 IP node01.5405 node02.5405: UDP, length 70 17:41:06.678654 IP node02.5405 node01.5405: UDP, length 70 17:41:06.867757 IP node01.5405 node02.5405: UDP, length 70 17:41:06.872492 IP node02.5405 node01.5405: UDP, length 70 17:41:06.888576 IP node01.52580 node02.5405: UDP, length 82 17:41:07.061664 IP node01.5405 node02.5405: UDP, length 70 17:41:07.066304 IP node02.5405 node01.5405: UDP, length 70 17:41:07.255409 IP node01.5405 node02.5405: UDP, length 70 17:41:07.260512 IP node02.5405 node01.5405: UDP, length 70 17:41:07.275601 IP node01.52580 node02.5405: UDP, length 82 Mit freundlichen Grüßen / Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Becker, Björn Gesendet: Donnerstag, 23. Januar 2014 17:28 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover Hi Lukas, thank you. Well, I've to wait for some firewall changes for 5405 UDP. But I'm not sure if it's correct what I'm doing. Node1: interface { member { memberaddr: 10.128.61.60 # node 1 } member { memberaddr: 10.128.62.60 # node 2 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.61.0 mcastport: 5405 } transport: udpu Node2: interface { member { memberaddr: 10.128.61.60 } member { memberaddr: 10.128.62.60 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.62.0 mcastport: 5405 } transport: udpu Something seems to be wrong defenitly. My firewall was on very high load... Mit freundlichen Grüßen / Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar Gesendet: Donnerstag, 23. Januar 2014 16:54 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover Hi Björn Here ist an example how you can setup corosync to use Unicast UDP: https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu The important parts are transport: udpu and that you need to configure every member manually using memberaddr: 10.16.35.115. Best regards Lukas On Thu, 23 Jan 2014 13:36:22 + bjoern.bec...@easycash.de wrote: Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart
Re: [Linux-HA] heartbeat failover
On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart development in the future. Everything (even RH) is standardizing on the corosync+pacemaker stack, so it has the most vibrant community as well. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat errors related to Gmain_timeout_dispatch at low traffic
Hi Lars, We observed one pattern with these errors - at most of the case,on both VMs , these errors came at the same time. We are suspecting either network issue in that case only late heartbeat error will come not Gmain_timeout_dispatch related errors right ? or VM is getting paused for sometime for some reason and when it is resumed Gmain_timeout_dispatch/late heartbeat errors are coming. We are investigating more on this. @heartbeat 3 - for this issue most of the time advice given was to upgrade. But we are using same heartbeat version in other setups also and it is working fine there. What do you think? Regards, Savita On Tue, Nov 19, 2013 at 4:23 PM, Lars Ellenberg lars.ellenb...@linbit.comwrote: On Thu, Nov 14, 2013 at 04:46:16PM +0530, Savita Kulkarni wrote: Hi, Recently we are seeing lots of heartbeat errors related to Gmain_timeout_dispatch on our system. I checked on mailing list archives if other people have faced this issue. There are few email threads regarding this but people are seeing this issue in case of high load. On our system there is very low/no load is present. We are running heartbeat on guest VMs, using VMWARE ESXi 5.0. We have heartbeat -2.1.3-4 It is working fine without any issues on other other setups and issue is coming only on this setup. Following types of errors are present in /var/log/messages Nov 12 09:58:43 heartbeat: [23036]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status was delayed 15270 ms ( 1010 ms) before being called (GSource: 0x138926b8) Nov 12 09:59:00 heartbeat: [23036]: info: Gmain_timeout_dispatch: started at 583294569 should have started at 583293042 Nov 12 09:59:00 heartbeat: [23036]: WARN: Gmain_timeout_dispatch: Dispatch function for update msgfree count was delayed 33960 ms ( 1 ms) before being called (GSource: 0x13892f58) Can anyone tell me what can be the issue? Can it be a hardware issue? Could be many things, even that, yes. Could be that upgrading to recent heartbeat 3 helps. Could be that there is to little load, and your virtualization just stops scheduling the VM itself, because it thinks it is underutilized... Does it recover if you kill/restart heartbeat? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat errors related to Gmain_timeout_dispatch at low traffic
On Thu, Nov 14, 2013 at 04:46:16PM +0530, Savita Kulkarni wrote: Hi, Recently we are seeing lots of heartbeat errors related to Gmain_timeout_dispatch on our system. I checked on mailing list archives if other people have faced this issue. There are few email threads regarding this but people are seeing this issue in case of high load. On our system there is very low/no load is present. We are running heartbeat on guest VMs, using VMWARE ESXi 5.0. We have heartbeat -2.1.3-4 It is working fine without any issues on other other setups and issue is coming only on this setup. Following types of errors are present in /var/log/messages Nov 12 09:58:43 heartbeat: [23036]: WARN: Gmain_timeout_dispatch: Dispatch function for send local status was delayed 15270 ms ( 1010 ms) before being called (GSource: 0x138926b8) Nov 12 09:59:00 heartbeat: [23036]: info: Gmain_timeout_dispatch: started at 583294569 should have started at 583293042 Nov 12 09:59:00 heartbeat: [23036]: WARN: Gmain_timeout_dispatch: Dispatch function for update msgfree count was delayed 33960 ms ( 1 ms) before being called (GSource: 0x13892f58) Can anyone tell me what can be the issue? Can it be a hardware issue? Could be many things, even that, yes. Could be that upgrading to recent heartbeat 3 helps. Could be that there is to little load, and your virtualization just stops scheduling the VM itself, because it thinks it is underutilized... Does it recover if you kill/restart heartbeat? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat v1 and stonith/stonith_host ipmilan
On Wed, Jul 17, 2013 at 6:03 PM, Martin Langhoff martin.langh...@gmail.com wrote: But the 'stonith' script/binary and the scripts that the old documentation indicates aren't there anymore (when I install on RHEL6.4). Configuring stonith_host external foo bar baz led me in the right direction. heartbeat knows what to do, but on RHEL/CentOS/SL 6.x cluster-glue no longer includes stonith agents. Some info at http://www.gossamer-threads.com/lists/linuxha/pacemaker/74487 So I rebuilt the RPMs for cluster-glue reversing that removal. It is a dicey proposition, of course, to setup a cluster that I expect to be long-lived based on software that folks are running to deprecate. But I have played with corosync + pacemaker extensively, and TBH they are way overkill for a simple setup. Is there a _simple_ setup guide for a two node cluster? Y'know, LVM, couple mountpoints, one server daemon (mysql)? I am not afraid of complexity; but I like to pick where to invest in complexity :-) cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat v1 and stonith/stonith_host ipmilan
On 17/07/13 20:43, Martin Langhoff wrote: On Wed, Jul 17, 2013 at 6:03 PM, Martin Langhoff martin.langh...@gmail.com wrote: But the 'stonith' script/binary and the scripts that the old documentation indicates aren't there anymore (when I install on RHEL6.4). Configuring stonith_host external foo bar baz led me in the right direction. heartbeat knows what to do, but on RHEL/CentOS/SL 6.x cluster-glue no longer includes stonith agents. Some info at http://www.gossamer-threads.com/lists/linuxha/pacemaker/74487 So I rebuilt the RPMs for cluster-glue reversing that removal. It is a dicey proposition, of course, to setup a cluster that I expect to be long-lived based on software that folks are running to deprecate. But I have played with corosync + pacemaker extensively, and TBH they are way overkill for a simple setup. Is there a _simple_ setup guide for a two node cluster? Y'know, LVM, couple mountpoints, one server daemon (mysql)? I am not afraid of complexity; but I like to pick where to invest in complexity :-) cheers, The easiest, native way under RHEL/CentOS is to use corosync + cman + rgmanager. The configuration you are describing will be simple and will be properly supported until 2020 (at least), and not need hacks. If you're interested in this approach, I can help. Here or on #linux-cluster on freenode's IRC. digimer -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat v1 and stonith/stonith_host ipmilan
On Wed, Jul 17, 2013 at 9:34 PM, Digimer li...@alteeve.ca wrote: The easiest, native way under RHEL/CentOS is to use corosync + cman + rgmanager. The configuration you are describing will be simple and will be properly supported until 2020 (at least), and not need hacks. If you're interested in this approach, I can help. Here or on #linux-cluster on freenode's IRC. Thanks for the offer to help. Is there any clear setup guide you can point me to? My TZ is EDT, so midnight (bedtime!) now. I won't be awake and on email/irc until tomorrow morning. m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat v1 and stonith/stonith_host ipmilan
On 18/07/13 00:12, Martin Langhoff wrote: On Wed, Jul 17, 2013 at 9:34 PM, Digimer li...@alteeve.ca wrote: The easiest, native way under RHEL/CentOS is to use corosync + cman + rgmanager. The configuration you are describing will be simple and will be properly supported until 2020 (at least), and not need hacks. If you're interested in this approach, I can help. Here or on #linux-cluster on freenode's IRC. Thanks for the offer to help. Is there any clear setup guide you can point me to? My TZ is EDT, so midnight (bedtime!) now. I won't be awake and on email/irc until tomorrow morning. Heh, same timezone, but I'm more of a night owl. :) I have a tutorial that was written for people who want to host highly-available VMs on a two-node red hat cluster. It goes into a lot of detail that you may not be interested in, but I think it's pretty comprehensive (I tried to assume no prior knowledge of HA). So perhaps you can tease out the parts you're interested in. https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial You're configuration would need basically; * Node definitions with fence methods defined * Resource section covering your storage and daemon * failover domain to control which node is primary for a given service and which is the backup The tutorial covers clustered LVM and uses the GFS2 clustered file system. So it anticipates a somewhat complex setup. If you are looking for simple failover, you can skip all of that. You could even dump LVM all together, if your goal is to simply support MySQL's data storage. So the config, in this case, you be; * The cluster name is foo * This is a two node cluster (disable quorum) ** Node 1 is this, and here is how you fence it ** Node 2 is this, and here is how you fence it * Resource; ** I have a file system resource call X mounted at Y ** I have a script resource that controls daemon Z * Failover Domain ** I have an ordered domain that says run on node 1 when possible, node 2 otherwise. if you fail over to node 2, stay there when node 1 returns * Service ** Create an ordered service that follows the rules set in failover damain. This service requires the FS to mount before the daemon service starts. Stop in the reverse order That's it. It might seem a little overwhelming at first, but it really is pretty simple. You already understand the concept of fencing, which trips up most people, so you're more than half-way there. So long as your switch handles multicast, your golden. If not, no big deal, just add the configuration option that forces unicast mode. hope this helps -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat haresources with IPv6
Ho Thiago, Heartbeat is deprecated and has not been developed in some time. There are no plans to restart development, either. It is _strongly_ advised that new setups use corosync + pacemaker. You can use the IPv6 resource agents with it, too. The best place to look is on clusterlabs.org's Cluster from Scratch tutorial. It covers as the first example setting up an (IPv4) virtual IP address. It should be easy to adapt that to your IPv6 implementation. You will see two versions; One for crmsh and one for pcs. I would recommend the crmsh version for Ubuntu. Cheers On 06/17/2013 11:35 AM, lis...@adminlinux.com.br wrote: Hi, I'm using Ubuntu 12.04 + Heartbeat 3.0.5-3ubuntu2 to provide high availability for some IP addresses. I want to configure an IPv6 address on my haresources. I did this: File /etc/heartbeat/haresources: server.domain.com \ nbsp;nbsp;nbsp; 192.168.2.62/32/eth1 \ nbsp;nbsp;nbsp; 192.168.2.64/32/eth1 \ nbsp;nbsp;nbsp; 192.168.2.72/32/eth1 \ nbsp;nbsp;nbsp; IPv6addr::2001:db8:38a5:8::2006/48/eth1 \ nbsp;nbsp;nbsp; MailTo::a...@domain.com The IPv4 addresses work fine, but I'm not getting success with the IPv6 address. My logs shows this message: ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 2001:db8:38a5:8 2006/48/eth1 start ResourceManager[22129]: CRIT: Giving up resources due to failure of IPv6addr::2001:db8:38a5:8::2006/48/eth1 ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 2001:db8:38a5:8 2006/48/eth1 stop ResourceManager[22129]: info: Retrying failed stop operation [IPv6addr::2001:db8:38a5:8::2006/48/eth1] Apparently there is a conflict between the characters '::' inside the IPv6 address and the separator '::' used in the haresources. But I would not like have to expand the IPv6 address. Does anyone know a way to avoid this conflict? Thanks! -- Thiago Henrique www.adminlinux.com.br ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat haresources with IPv6
On Fri, Jun 14, 2013 at 03:29:49PM -0300, lis...@adminlinux.com.br wrote: Hi, I'm using Ubuntu 12.04 + Heartbeat 3.0.5-3ubuntu2 to provide high availability for some IP addresses. I want to configure an IPv6 address on my haresources. I did this: File /etc/heartbeat/haresources: server.domain.com \ 192.168.2.62/32/eth1 \ 192.168.2.64/32/eth1 \ 192.168.2.72/32/eth1 \ IPv6addr::2001:db8:38a5:8::2006/48/eth1 \ MailTo::a...@domain.com The IPv4 addresses work fine, but I'm not getting success with the IPv6 address. My logs shows this message: ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 2001:db8:38a5:8 2006/48/eth1 start ResourceManager[22129]: CRIT: Giving up resources due to failure of IPv6addr::2001:db8:38a5:8::2006/48/eth1 ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 2001:db8:38a5:8 2006/48/eth1 stop ResourceManager[22129]: info: Retrying failed stop operation [IPv6addr::2001:db8:38a5:8::2006/48/eth1] Apparently there is a conflict between the characters '::' inside the IPv6 address and the separator '::' used in the haresources. But I would not like have to expand the IPv6 address. Does anyone know a way to avoid this conflict? You can't have it all ;-) I see several options. - use 2001:db8:38a5:8:0:0:0:2006/48/eth1 - abandon haresource - hack the ResourceManager script of heartbeat, allow for escaping, or special case IPv6addr or similar... it's plain shell after all - hack the resource.d/IPv6addr *wrapper* script only, to mangle the input parameters. The last two options would look something like below. You need only *one* of these, though using both would not hurt. Untested, and likely whitespace mangled ;-) --- ResourceManager +++ ResourceManager @@ -167,6 +167,11 @@ resource2script() { # multiple arguments are separated by :: delimiters resource2arg() { case `canonname $1` in +IPv6addr::*) + # special case, there is only one argument, + # and it contains :: + echo $1 | sed 's%[^:]*::%%' + ;; *::*) echo $1 | sed 's%[^:]*::%%' | sed 's%::% %g' ;; esac --- IPv6addr +++ IPv6addr @@ -17,6 +17,8 @@ usage() { exit 1 } +[ $# = 3 ] set -- $1::$2 $3 + if [ $# != 2 ]; then usage fi Cheers, -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat 'ERROR' messages
I know it's tacky to reply to myself, but I can answer one of my questions after another 15 minutes or so of poring through logs: On Tue, 2013-05-28 at 10:37 -0600, Greg Woods wrote: The questions are what do these messages actually mean, why is one cluster logging them and not the other, and is this something I should be worried about? The answer to the last one is that this is definitely a problem, because after nearly half an hour, this is logged: May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] : [src=vmx1.ucar.edu] May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] : [(1)srcuuid=0x5ceb390(36 27)] May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] : [seq=3a4] May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] : [hg=4c97c17a] May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] : [ts=51a13888] May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] : [ld=0.50 0.33 0.28 3/316 13859] May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[10] : [ttl=3] May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[11] : [auth=1 feb94da356847a538290ea75f27423c996c0a595] May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: write_child: Exiting due to persistent errors: No such device May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: WARN: Managed HBWRITE process 5689 exited with return code 1. May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: ERROR: HBWRITE process died. Beginning communications restart process for comm channel 1. May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth4 - Status: 1 May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: WARN: Managed HBREAD process 5690 killed by signal 9 [SIGKILL - Kill, unblockable]. May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: ERROR: Both comm processes for channel 1 have died. Restarting. May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: info: glib: UDP Broadcast heartbeat started on port 694 (694) interface eth4 May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: info: glib: UDP Broadcast heartbeat closed on port 694 interface eth4 - Status: 1 May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: info: Communications restart succeeded. May 25 16:17:45 vmx1.ucar.edu heartbeat: [5683]: info: Link vmx2.ucar.edu:eth4 up. And VMs stop being reachable, etc. The only way to stabilize things is to not start heartbeat on one of the nodes (vmx1 arbitrarily chosen) and run all resources on a single node (vmx2 in this case). --Greg ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat 'ERROR' messages
On 29/05/2013, at 2:37 AM, Greg Woods wo...@ucar.edu wrote: I have two clusters that are both running CentOS 5.6 and heartbeat-3.0.3-2.3.el5 (from the clusterlabs repo). THey are running slightly different pacemaker versions (pacemaker-1.0.9.1-1.15.el5 on the first one and pacemaker-1.0.12-1.el5 on the other) They both have identical ha.cf files except that the bcast device names are different (and they are correct for each case, I checked), like this: udpport 694 bcast eth2 bcast eth1 use_logd off logfile /var/log/halog debugfile /var/log/hadebug debug 1 keepalive 2 deadtime 15 initdead 60 node vmd1.ucar.edu node vmd2.ucar.edu auto_failback off respawn hacluster /usr/lib64/heartbeat/ipfail crm respawn I don't know about the rest, but definitely do not use both ipfail and crm. Pick one :) On one of them (which maybe or maybe not coincidentally is having some problems), I get these messages logged about every 2 seconds in /var/log/halog, on the other I don't see them: May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG: Dumping message with 10 fields May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[0] : [t=NS_ackmsg] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[1] : [dest=vmx2.ucar.edu] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[2] : [ackseq=3a0] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[3] : [(1)destuuid=0x5ceb280(37 28)] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] : [src=vmx1.ucar.edu] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] : [(1)srcuuid=0x5ceb390(36 27)] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] : [hg=4c97c17a] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] : [ts=51a13435] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] : [ttl=3] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] : [auth=1 23b556bcb61a08abecf87cb6411c62e62cf99f0d] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG: Dumping message with 12 fields May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[0] : [t=status] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[1] : [st=active] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[2] : [dt=3a98] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[3] : [protocol=1] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] : [src=vmx1.ucar.edu] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] : [(1)srcuuid=0x5ceb390(36 27)] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] : [seq=17b] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] : [hg=4c97c17a] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] : [ts=51a13435] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] : [ld=0.27 0.41 0.26 1/315 19183] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[10] : [ttl=3] May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[11] : [auth=1 3d3da4df831636f7c274395041ffb49bbf215170] The questions are what do these messages actually mean, why is one cluster logging them and not the other, and is this something I should be worried about? Thanks for any info, --Greg ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat 'ERROR' messages
On Wed, 2013-05-29 at 07:50 +1000, Andrew Beekhof wrote: respawn hacluster /usr/lib64/heartbeat/ipfail crm respawn I don't know about the rest, but definitely do not use both ipfail and crm. Pick one :) I guess I will have to look into what ipfail really does. I have a half dozen clusters that have virtually the same ha.cf files and they have been running for 2+ years with it specified this way. --Greg ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat 'ERROR' messages
On 29/05/2013, at 8:05 AM, Greg Woods wo...@ucar.edu wrote: On Wed, 2013-05-29 at 07:50 +1000, Andrew Beekhof wrote: respawn hacluster /usr/lib64/heartbeat/ipfail crm respawn I don't know about the rest, but definitely do not use both ipfail and crm. Pick one :) I guess I will have to look into what ipfail really does. With crm enabled, nothing. Try http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explained/_moving_resources_due_to_connectivity_changes.html I have a half dozen clusters that have virtually the same ha.cf files and they have been running for 2+ years with it specified this way. --Greg ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
Hi Nick, Could you privide which version of resource-agents you're using? Prior to 3.9.2, IPv6addr requires a static IPv6 address with the exactly same prefix to find out an apropriate nic; so you should have statically assigned 2600:3c00::34:c003/116 on eth0 for example. As of 3.9.3, it has relaxed and the specified nic is always used no matter if the prefix does not match; so it should just work. (at least it works for me) Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a virtual IPv6 address, which is brand new and I would prefer this because it uses the standard ip command. Thanks, 2013/3/25 Nick Walke tubaguy50...@gmail.com: This the correct place to report bugs? https://github.com/ClusterLabs/resource-agents Nick On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.dewrote: Hello Nick, I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? that is correct. I tried modifying my netmask to copy yours. And I get the same error, you do: ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown error So probably a bug in the resource agent. Manually adding and removing works: (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/116 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591887sec preferred_lft 604687sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0 Nick, you can do the following things to resolve this: - Hunt down the bug and fix it or let someone else do it for you - Use another netmask, if possible (fighting the symptoms instead of resolving the root cause) - Write your own resource agent (fighting the symptoms instead of resolving the root cause) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
Looks like 3.9.2-5. So I need to statically assign the address I want to use before using it with IPv6addr? On Mar 25, 2013 3:44 AM, Keisuke MORI keisuke.mori...@gmail.com wrote: Hi Nick, Could you privide which version of resource-agents you're using? Prior to 3.9.2, IPv6addr requires a static IPv6 address with the exactly same prefix to find out an apropriate nic; so you should have statically assigned 2600:3c00::34:c003/116 on eth0 for example. As of 3.9.3, it has relaxed and the specified nic is always used no matter if the prefix does not match; so it should just work. (at least it works for me) Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a virtual IPv6 address, which is brand new and I would prefer this because it uses the standard ip command. Thanks, 2013/3/25 Nick Walke tubaguy50...@gmail.com: This the correct place to report bugs? https://github.com/ClusterLabs/resource-agents Nick On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.de wrote: Hello Nick, I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? that is correct. I tried modifying my netmask to copy yours. And I get the same error, you do: ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown error So probably a bug in the resource agent. Manually adding and removing works: (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/116 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591887sec preferred_lft 604687sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0 Nick, you can do the following things to resolve this: - Hunt down the bug and fix it or let someone else do it for you - Use another netmask, if possible (fighting the symptoms instead of resolving the root cause) - Write your own resource agent (fighting the symptoms instead of resolving the root cause) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
2013/3/25 Nick Walke tubaguy50...@gmail.com: Looks like 3.9.2-5. So I need to statically assign the address I want to use before using it with IPv6addr? Yes. On Mar 25, 2013 3:44 AM, Keisuke MORI keisuke.mori...@gmail.com wrote: Hi Nick, Could you privide which version of resource-agents you're using? Prior to 3.9.2, IPv6addr requires a static IPv6 address with the exactly same prefix to find out an apropriate nic; so you should have statically assigned 2600:3c00::34:c003/116 on eth0 for example. As of 3.9.3, it has relaxed and the specified nic is always used no matter if the prefix does not match; so it should just work. (at least it works for me) Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a virtual IPv6 address, which is brand new and I would prefer this because it uses the standard ip command. Thanks, 2013/3/25 Nick Walke tubaguy50...@gmail.com: This the correct place to report bugs? https://github.com/ClusterLabs/resource-agents Nick On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.de wrote: Hello Nick, I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? that is correct. I tried modifying my netmask to copy yours. And I get the same error, you do: ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown error So probably a bug in the resource agent. Manually adding and removing works: (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/116 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591887sec preferred_lft 604687sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0 Nick, you can do the following things to resolve this: - Hunt down the bug and fix it or let someone else do it for you - Use another netmask, if possible (fighting the symptoms instead of resolving the root cause) - Write your own resource agent (fighting the symptoms instead of resolving the root cause) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
That works. Thanks! Nick On Mon, Mar 25, 2013 at 4:22 AM, Keisuke MORI keisuke.mori...@gmail.comwrote: 2013/3/25 Nick Walke tubaguy50...@gmail.com: Looks like 3.9.2-5. So I need to statically assign the address I want to use before using it with IPv6addr? Yes. On Mar 25, 2013 3:44 AM, Keisuke MORI keisuke.mori...@gmail.com wrote: Hi Nick, Could you privide which version of resource-agents you're using? Prior to 3.9.2, IPv6addr requires a static IPv6 address with the exactly same prefix to find out an apropriate nic; so you should have statically assigned 2600:3c00::34:c003/116 on eth0 for example. As of 3.9.3, it has relaxed and the specified nic is always used no matter if the prefix does not match; so it should just work. (at least it works for me) Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a virtual IPv6 address, which is brand new and I would prefer this because it uses the standard ip command. Thanks, 2013/3/25 Nick Walke tubaguy50...@gmail.com: This the correct place to report bugs? https://github.com/ClusterLabs/resource-agents Nick On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.de wrote: Hello Nick, I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? that is correct. I tried modifying my netmask to copy yours. And I get the same error, you do: ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown error So probably a bug in the resource agent. Manually adding and removing works: (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/116 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591887sec preferred_lft 604687sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0 Nick, you can do the following things to resolve this: - Hunt down the bug and fix it or let someone else do it for you - Use another netmask, if possible (fighting the symptoms instead of resolving the root cause) - Write your own resource agent (fighting the symptoms instead of resolving the root cause) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- Keisuke MORI ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
Hello, ipv6addr=2600:3c00::0034:c007 from the manpage of ocf_heartbeat_IPv6addr it looks like that you have to specify the netmask so try: ipv6addr=2600:3c00::0034:c007/64 assuiming that you're in a /64. Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
Thanks for the tip, however, it did not work. That's actually a /116. So I put in 2600:3c00::0034:c007/116 and am getting the same error. I requested that it restart the resource as well, just to make sure it wasn't the previous error. Nick On Sun, Mar 24, 2013 at 3:55 AM, Thomas Glanzmann tho...@glanzmann.dewrote: Hello, ipv6addr=2600:3c00::0034:c007 from the manpage of ocf_heartbeat_IPv6addr it looks like that you have to specify the netmask so try: ipv6addr=2600:3c00::0034:c007/64 assuiming that you're in a /64. Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
Hello Nick Try to use nic=eth0 instead of nic=eth0:3 thanks 2013/3/24 Nick Walke tubaguy50...@gmail.com Thanks for the tip, however, it did not work. That's actually a /116. So I put in 2600:3c00::0034:c007/116 and am getting the same error. I requested that it restart the resource as well, just to make sure it wasn't the previous error. Nick On Sun, Mar 24, 2013 at 3:55 AM, Thomas Glanzmann tho...@glanzmann.de wrote: Hello, ipv6addr=2600:3c00::0034:c007 from the manpage of ocf_heartbeat_IPv6addr it looks like that you have to specify the netmask so try: ipv6addr=2600:3c00::0034:c007/64 assuiming that you're in a /64. Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- esta es mi vida e me la vivo hasta que dios quiera ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
Hello Nick, Thanks for the tip, however, it did not work. That's actually a /116. So I put in 2600:3c00::0034:c007/116 and am getting the same error. I requested that it restart the resource as well, just to make sure it wasn't the previous error. now, I had to try it: node $id=9d9b62d2-405d-459a-a724-cb2643d7d9a1 node-62 primitive ipv6test ocf:heartbeat:IPv6addr \ params ipv6addr=2a01:4f8:bb:400::2/64 \ op monitor interval=15 timeout=15 \ meta target-role=Started property $id=cib-bootstrap-options \ dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=Heartbeat \ stonith-enabled=false And it works: (node-62) [~] ifconfig eth0 Link encap:Ethernet HWaddr 00:25:90:97:db:b0 inet addr:10.100.4.62 Bcast:10.100.255.255 Mask:255.255.0.0 inet6 addr: 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 Scope:Global inet6 addr: fe80::225:90ff:fe97:dbb0/64 Scope:Link inet6 addr: 2a01:4f8:bb:400::2/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:40345 errors:0 dropped:0 overruns:0 frame:0 TX packets:10270 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:52540127 (50.1 MiB) TX bytes:1127817 (1.0 MiB) Memory:fb58-fb60 (infra) [~] traceroute 2a01:4f8:bb:400::2 traceroute to 2a01:4f8:bb:400::2 (2a01:4f8:bb:400::2), 30 hops max, 80 byte packets 1 merlin.glanzmann.de (2a01:4f8:bb:4ff::1) 1.413 ms 1.550 ms 1.791 ms 2 2a01:4f8:bb:400::2 (2a01:4f8:bb:400::2) 0.204 ms 0.202 ms 0.270 ms Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
On Sun, 2013-03-24 at 01:36 -0700, tubaguy50035 wrote: params ipv6addr=2600:3c00::0034:c007 nic=eth0:3 \ Are you sure that's a valid IPV6 address? I get headaches every time I look at these, but it seems a valid address is 8 groups, and you've got 5 there. Maybe you mean 2600:3c00::0034:c007? --Greg ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
I don't know what I'm doing wrong then. I copied exactly what you put in and now I'm getting these errors: ipv6test_start_0 (node=tek-lin-lb1, call=25, rc=1, status=complete): unknown error ipv6test_start_0 (node=tek-lin-lb2, call=20, rc=1, status=complete): unknown error Looking in my syslog I see: Mar 24 14:37:13 tek-lin-lb2 IPv6addr: [8038]: ERROR: no valid mecahnisms Mar 24 14:37:13 tek-lin-lb2 lrmd: [3005]: info: operation start[18] on ipv6test for client 3008: pid 8038 exited with return code 1 Mar 24 14:37:13 tek-lin-lb2 crmd: [3008]: info: process_lrm_event: LRM operation ipv6test_start_0 (call=18, rc=1, cib-update=65, confirmed=true) unknown error Anything I need to do to allow IPv6... or something? Nick On Sun, Mar 24, 2013 at 4:29 AM, Thomas Glanzmann tho...@glanzmann.dewrote: Hello Nick, Thanks for the tip, however, it did not work. That's actually a /116. So I put in 2600:3c00::0034:c007/116 and am getting the same error. I requested that it restart the resource as well, just to make sure it wasn't the previous error. now, I had to try it: node $id=9d9b62d2-405d-459a-a724-cb2643d7d9a1 node-62 primitive ipv6test ocf:heartbeat:IPv6addr \ params ipv6addr=2a01:4f8:bb:400::2/64 \ op monitor interval=15 timeout=15 \ meta target-role=Started property $id=cib-bootstrap-options \ dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \ cluster-infrastructure=Heartbeat \ stonith-enabled=false And it works: (node-62) [~] ifconfig eth0 Link encap:Ethernet HWaddr 00:25:90:97:db:b0 inet addr:10.100.4.62 Bcast:10.100.255.255 Mask:255.255.0.0 inet6 addr: 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 Scope:Global inet6 addr: fe80::225:90ff:fe97:dbb0/64 Scope:Link inet6 addr: 2a01:4f8:bb:400::2/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:40345 errors:0 dropped:0 overruns:0 frame:0 TX packets:10270 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:52540127 (50.1 MiB) TX bytes:1127817 (1.0 MiB) Memory:fb58-fb60 (infra) [~] traceroute 2a01:4f8:bb:400::2 traceroute to 2a01:4f8:bb:400::2 (2a01:4f8:bb:400::2), 30 hops max, 80 byte packets 1 merlin.glanzmann.de (2a01:4f8:bb:4ff::1) 1.413 ms 1.550 ms 1.791 ms 2 2a01:4f8:bb:400::2 (2a01:4f8:bb:400::2) 0.204 ms 0.202 ms 0.270 ms Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
Hello Nick, Anything I need to do to allow IPv6... or something? I agree with Greg here. Have you tried setting the address manually? ip -6 addr add ip/cidr dev eth0 ip -6 addr show dev eth0 ip -6 addr del ip/cidr dev eth0 ip -6 addr show dev eth0 (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::3/64 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::3/64 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400::2/64 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591998sec preferred_lft 604798sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::3/64 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/64 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591990sec preferred_lft 604790sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever Do you see a link local address on your eth0? A link local address is one that starts with fe80:: otherwise try loading the ipv6 module: modprobe ipv6 # Don't know if that is the right module name, all my # kernels have ipv6 build in (Debian wheezy / squeeze / backports) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
From the first node: nick@tek-lin-lb1:~$ sudo ip -6 addr add 2600:3c00::34:c007/116 dev eth0 nick@tek-lin-lb1:~$ sudo ip -6 addr show dev eth0 3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2600:3c00::34:c007/116 scope global valid_lft forever preferred_lft forever inet6 2600:3c00::f03c:91ff:fe70:7541/64 scope global dynamic valid_lft 43200sec preferred_lft 43200sec inet6 2600:3c00::34:c003/64 scope global valid_lft forever preferred_lft forever inet6 fe80::f03c:91ff:fe70:7541/64 scope link valid_lft forever preferred_lft forever nick@tek-lin-lb1:~$ sudo ip -6 addr del 2600:3c00::34:c007/116 dev eth0 nick@tek-lin-lb1:~$ sudo ip -6 addr show dev eth0 3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2600:3c00::f03c:91ff:fe70:7541/64 scope global dynamic valid_lft 43200sec preferred_lft 43200sec inet6 2600:3c00::34:c003/64 scope global valid_lft forever preferred_lft forever inet6 fe80::f03c:91ff:fe70:7541/64 scope link valid_lft forever preferred_lft forever From the second node: nick@tek-lin-lb2:~$ sudo ip -6 addr add 2600:3c00::34:c007/116 dev eth0 nick@tek-lin-lb2:~$ sudo ip -6 addr show dev eth0 3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2600:3c00::34:c007/116 scope global valid_lft forever preferred_lft forever inet6 2600:3c00::f03c:91ff:fe70:f0a4/64 scope global dynamic valid_lft 43190sec preferred_lft 43190sec inet6 2600:3c00::34:c005/64 scope global valid_lft forever preferred_lft forever inet6 fe80::f03c:91ff:fe70:f0a4/64 scope link valid_lft forever preferred_lft forever nick@tek-lin-lb2:~$ sudo ip -6 addr del 2600:3c00::34:c007/116 dev eth0 nick@tek-lin-lb2:~$ sudo ip -6 addr show dev eth0 3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2600:3c00::f03c:91ff:fe70:f0a4/64 scope global dynamic valid_lft 43197sec preferred_lft 43197sec inet6 2600:3c00::34:c005/64 scope global valid_lft forever preferred_lft forever inet6 fe80::f03c:91ff:fe70:f0a4/64 scope link valid_lft forever preferred_lft forever I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? So it seems like it is. Nick On Sun, Mar 24, 2013 at 3:16 PM, Thomas Glanzmann tho...@glanzmann.dewrote: Hello Nick, Anything I need to do to allow IPv6... or something? I agree with Greg here. Have you tried setting the address manually? ip -6 addr add ip/cidr dev eth0 ip -6 addr show dev eth0 ip -6 addr del ip/cidr dev eth0 ip -6 addr show dev eth0 (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::3/64 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::3/64 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400::2/64 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591998sec preferred_lft 604798sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::3/64 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/64 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591990sec preferred_lft 604790sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever Do you see a link local address on your eth0? A link local address is one that starts with fe80:: otherwise try loading the ipv6 module: modprobe ipv6 # Don't know if that is the right module name, all my # kernels have ipv6 build in (Debian wheezy / squeeze / backports) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
Hello Nick, I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? that is correct. I tried modifying my netmask to copy yours. And I get the same error, you do: ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown error So probably a bug in the resource agent. Manually adding and removing works: (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/116 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591887sec preferred_lft 604687sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0 Nick, you can do the following things to resolve this: - Hunt down the bug and fix it or let someone else do it for you - Use another netmask, if possible (fighting the symptoms instead of resolving the root cause) - Write your own resource agent (fighting the symptoms instead of resolving the root cause) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat IPv6addr OCF
This the correct place to report bugs? https://github.com/ClusterLabs/resource-agents Nick On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.dewrote: Hello Nick, I shouldn't be able to do that if the IPv6 module wasn't loaded, correct? that is correct. I tried modifying my netmask to copy yours. And I get the same error, you do: ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown error So probably a bug in the resource agent. Manually adding and removing works: (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0 (node-62) [~] ip -6 addr show dev eth0 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000 inet6 2a01:4f8:bb:400::2/116 scope global valid_lft forever preferred_lft forever inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic valid_lft 2591887sec preferred_lft 604687sec inet6 fe80::225:90ff:fe97:dbb0/64 scope link valid_lft forever preferred_lft forever (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0 Nick, you can do the following things to resolve this: - Hunt down the bug and fix it or let someone else do it for you - Use another netmask, if possible (fighting the symptoms instead of resolving the root cause) - Write your own resource agent (fighting the symptoms instead of resolving the root cause) Cheers, Thomas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect
On Thu, Dec 13, 2012 at 12:41:29AM +0100, Lars Marowsky-Bree wrote: On 2012-12-13T10:31:55, Andrew Beekhof and...@beekhof.net wrote: We once moved the ocf-shellfuncs file, which didn't work out here when I thought we never did this sort of thing because we don't know how people are using our stuff externally. We did it in a backwards-compatible manner; or at least if the packagers choose to, they could symlink the old location to the new one. (That is the default for the included spec files, I think.) Right. Whichever way some other RA may have used the shellfuncs, it would continue to work with the new package. That obviously needed to be supported. The old filenames were starting with '.' which is a precedent and not well received by all distributions. Thanks, Dejan So yes, we try hard to never break updating. And to provide migration over several releases. None of the functions changed names, all variables still there, we don't drop agent attributes, that kind of stuff. But copying a new agent that has the new path embedded obviously doesn't work in the old environment. If you were trying to be snarky, I think this failed. ;-) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect
Am Montag, 10. Dezember 2012, 15:59:16 schrieb codey koble: To anyone who could help possibly: My current setup: 2 Ubuntu 10.04 LTS servers running heartbeat, pacemaker, apache, and mysql Heartbeat and pacemaker are running great for my needs with one exception, currently both nodes are showing mysql as slaves. I have mysql configured in a master/slave setup and that is working great on its own. I noticed when I tried to promote one of the servers that an error occurred stating that the ocf:heartbeat:mysql did not support the feature. I evaluated the script and realized it was an older version and did not contain any of the promote/demote code. I found the newest code for the script in the github repo and replaced the entire mysql file with the new code. Upon doing this it then gave an error stating that the ocf:heartbeat:mysql resource agent was not installed. My question would be is there a simple way to update the script instead of manually replacing it like I did, or is there a way to get the code I changed to working? Thanks in advance for any help! It seems that you have three options: 1) Go back to the old script and use it as a primitive resource, not a Master/Slave resource. 2) Keep the new script and debug, why the new script does not work in your environment. Perhaps some PATH is set wrong or some packages are not installed. 3) Upgrade to 12.04 LTS. This version should reflect recent developments in the cluster coftware. Perhaps you try option 2) first but in the mid term go for option 3). -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 München Tel: (0163) 172 50 98 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/10/2012 10:59 PM, codey koble wrote: To anyone who could help possibly: My current setup: 2 Ubuntu 10.04 LTS servers running heartbeat, pacemaker, apache, and mysql Heartbeat and pacemaker are running great for my needs with one exception, currently both nodes are showing mysql as slaves. I have mysql configured in a master/slave setup and that is working great on its own. I noticed when I tried to promote one of the servers that an error occurred stating that the ocf:heartbeat:mysql did not support the feature. I evaluated the script and realized it was an older version and did not contain any of the promote/demote code. I found the newest code for the script in the github repo and replaced the entire mysql file with the new code. Upon doing this it then gave an error stating that the ocf:heartbeat:mysql resource agent was not installed. Could you send the error message more precise? Does the cluster tell you the RA si not installed (check path and file permissions) or does the LRM tell that the RA itself has returned a exit code not installed (this would mean the RA does not find your mysql binaries/config/or whatever)? My question would be is there a simple way to update the script instead of manually replacing it like I did, or is there a way to get the code I changed to working? Thanks in advance for any help! ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.19 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://www.enigmail.net/ iQEcBAEBAgAGBQJQyH9xAAoJEJ1uHhrzMvZRjW4H/RUxkgL/nXyKZqz6xl8dDn3P bPcCqqOvSX2x32umwkEaS2JZ7Gabo8O7sHIZNC/HcrmDttoRo6L4BNR+W2QkQtMV FEuTVqktOq6WdeaZ2Hn66S42+IkzHOOJRRJzp0GSLfdlxzRiM2E+an/QmPwWbpZZ EFvZbyDScqrKyQo7vN5CE0K1yb9JCrOxLMO2NX1D2reiOv7f3pvslKO03eohLcy/ k4ZagdO9GvIPs7PPj+pI5aUYbH7ypejPR+z8e6OXpAgbfSQg7AJuTgllMcCsODAe BEb78ZWpa4pANAugRvJZ87A1ATjgJy2MBubyewqGRqghnNeqAjq5hPgzH9cuWoQ= =OfyW -END PGP SIGNATURE- ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect
On 2012-12-12T13:58:25, Fabian Herschel fabian.hersc...@arcor.de wrote: I noticed when I tried to promote one of the servers that an error occurred stating that the ocf:heartbeat:mysql did not support the feature. I evaluated the script and realized it was an older version and did not contain any of the promote/demote code. I found the newest code for the script in the github repo and replaced the entire mysql file with the new code. Upon doing this it then gave an error stating that the ocf:heartbeat:mysql resource agent was not installed. Could you send the error message more precise? Does the cluster tell you the RA si not installed (check path and file permissions) or does the LRM tell that the RA itself has returned a exit code not installed (this would mean the RA does not find your mysql binaries/config/or whatever)? We once moved the ocf-shellfuncs file, which didn't work out here when only a single script is updated and not the whole package. I suggest to upgrade the whole package and then investigate. Mit freundlichen Grüßen, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect
On Thu, Dec 13, 2012 at 12:24 AM, Lars Marowsky-Bree l...@suse.com wrote: On 2012-12-12T13:58:25, Fabian Herschel fabian.hersc...@arcor.de wrote: I noticed when I tried to promote one of the servers that an error occurred stating that the ocf:heartbeat:mysql did not support the feature. I evaluated the script and realized it was an older version and did not contain any of the promote/demote code. I found the newest code for the script in the github repo and replaced the entire mysql file with the new code. Upon doing this it then gave an error stating that the ocf:heartbeat:mysql resource agent was not installed. Could you send the error message more precise? Does the cluster tell you the RA si not installed (check path and file permissions) or does the LRM tell that the RA itself has returned a exit code not installed (this would mean the RA does not find your mysql binaries/config/or whatever)? We once moved the ocf-shellfuncs file, which didn't work out here when I thought we never did this sort of thing because we don't know how people are using our stuff externally. only a single script is updated and not the whole package. I suggest to upgrade the whole package and then investigate. Mit freundlichen Grüßen, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect
On 2012-12-13T10:31:55, Andrew Beekhof and...@beekhof.net wrote: We once moved the ocf-shellfuncs file, which didn't work out here when I thought we never did this sort of thing because we don't know how people are using our stuff externally. We did it in a backwards-compatible manner; or at least if the packagers choose to, they could symlink the old location to the new one. (That is the default for the included spec files, I think.) So yes, we try hard to never break updating. And to provide migration over several releases. None of the functions changed names, all variables still there, we don't drop agent attributes, that kind of stuff. But copying a new agent that has the new path embedded obviously doesn't work in the old environment. If you were trying to be snarky, I think this failed. ;-) Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat with Oracle's ASM
On 11/15/2012 05:00 AM, Hill Fang wrote: Hi friend: I want know heartbeat is support oracle ASM now?? The heartbeat project has been deprecated for some time. There are no plans to continue it's development. I am unsure of it's supported state on Oracle, but regardless, I would advice you plan to use corosync. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat with Oracle's ASM
There is an RA for Oracle that can be used with Pacemaker. Generally ASM behaves like a regular Oracle instance, so you can try it. On Nov 15, 2012 8:57 AM, Hill Fang hill.f...@ericsson.com wrote: Hi friend: I want know heartbeat is support oracle ASM now?? HILL FANG Engineer Guangzhou Ericsson Communication Services Co.,Ltd.(GTC) SI Support 2 /F, NO. 1025 Gaopu Road, Tianhe Software Park,Tianhe District, Guangzhou, 510663, PR China Phone +86 020-85117631 Fax +86 020-29002699 SMS/MMS 15813329521 hill.f...@ericsson.com www.ericsson.com [http://www.ericsson.com/]http://www.ericsson.com/ This Communication is Confidential. We only send and receive email on the basis of the terms set out at www.ericsson.com/email_disclaimer http://www.ericsson.com/email_disclaimer ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat with Oracle's ASM
On 2012-11-15T10:00:21, Hill Fang hill.f...@ericsson.com wrote: Hi friend: I want know heartbeat is support oracle ASM now?? No - and yes. Oracle RAC (I assume that's the context for ASM?) does not tolerate any cluster solution except itself. This is not supported together with Pacemaker. Pacemaker with the Oracle resource agent can manage a single instance fail-over for Oracle, yes. That is supported. Postgres/MySQL too. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat not starting when both nodes are down
El 08/10/2012 20:56, Andreas Kurz escribió: On 10/08/2012 09:42 PM, Nicolás wrote: El 28/09/2012 20:42, Nicolás escribió: Hi all! I'm new to this list, I've been looking to get some info about this but I haven't seen anything, so I'm trying this way. I've successfully configured a 2-node cluster with DRBD + Heartbeat + Pacemaker. It works as expected. The problem comes when both nodes are down. Having this, after powering on one of the nodes, I can see it configuring the network but after this I never see the console for this machine. So I try to connect via SSH and realize that Heartbeat is not running. After I run it manually I can see the console for this node. This only happens when BOTH nodes are down. When just one is, everything goes right as Heartbeat starts automatically on the powering-on node. I see nothing relevant in logs, my conf is as follows: root@cluster1:~# cat /etc/ha.d/ha.cf | grep -e ^[^#] logfacility local0 ucast eth1 192.168.0.91 ucast eth0 192.168.20.51 auto_failback on nodecluster1.gamez.es cluster2.gamez.es use_logd yes crm on autojoin none Any ideas on what am I doing wrong? [...] For a new cluster use Corosync and not Heartbeat,disable DRBD init script and configure it as a Pacemaker master-slave resource. Thanks for this! Once I disabled DRBD init script it worked as it should. Regards, Nicolás ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat not starting when both nodes are down
El 28/09/2012 20:42, Nicolás escribió: Hi all! I'm new to this list, I've been looking to get some info about this but I haven't seen anything, so I'm trying this way. I've successfully configured a 2-node cluster with DRBD + Heartbeat + Pacemaker. It works as expected. The problem comes when both nodes are down. Having this, after powering on one of the nodes, I can see it configuring the network but after this I never see the console for this machine. So I try to connect via SSH and realize that Heartbeat is not running. After I run it manually I can see the console for this node. This only happens when BOTH nodes are down. When just one is, everything goes right as Heartbeat starts automatically on the powering-on node. I see nothing relevant in logs, my conf is as follows: root@cluster1:~# cat /etc/ha.d/ha.cf | grep -e ^[^#] logfacility local0 ucast eth1 192.168.0.91 ucast eth0 192.168.20.51 auto_failback on nodecluster1.gamez.es cluster2.gamez.es use_logd yes crm on autojoin none Any ideas on what am I doing wrong? Thanks a lot in advance. Nicolás ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Any ideas with this? Thanks! ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat not starting when both nodes are down
On 10/08/2012 09:42 PM, Nicolás wrote: El 28/09/2012 20:42, Nicolás escribió: Hi all! I'm new to this list, I've been looking to get some info about this but I haven't seen anything, so I'm trying this way. I've successfully configured a 2-node cluster with DRBD + Heartbeat + Pacemaker. It works as expected. The problem comes when both nodes are down. Having this, after powering on one of the nodes, I can see it configuring the network but after this I never see the console for this machine. So I try to connect via SSH and realize that Heartbeat is not running. After I run it manually I can see the console for this node. This only happens when BOTH nodes are down. When just one is, everything goes right as Heartbeat starts automatically on the powering-on node. I see nothing relevant in logs, my conf is as follows: root@cluster1:~# cat /etc/ha.d/ha.cf | grep -e ^[^#] logfacility local0 ucast eth1 192.168.0.91 ucast eth0 192.168.20.51 auto_failback on nodecluster1.gamez.es cluster2.gamez.es use_logd yes crm on autojoin none Any ideas on what am I doing wrong? Looks like enabled DRBD init script with default startup-timeout parameters ... that script blocks until peer is connected or timeout -- default forever (depending on some configuration parameters) or manual confirmation on console ... as heartbeat is typically last in boot process it is not (yet) started. For a new cluster use Corosync and not Heartbeat,disable DRBD init script and configure it as a Pacemaker master-slave resource. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now Thanks a lot in advance. Nicolás ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems Any ideas with this? Thanks! ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat and n-to1 clusters
On Tue, Aug 7, 2012 at 1:42 AM, Andy Furtado awf...@yahoo.com wrote: Hello, Is it possible to setup an n-to-1 cluster configuration and have heartbeat manage a different VIP for each virtual pair. The n-to-1 configuration would have a single slave node, able to take over for any one of the failed N masters at a time. You'll want pacemaker on top of heartbeat for that. http://www.clusterlabs.org In this configuration each masternode would have a staticip-addr and a VIP. When the master fails, the VIP for that master is configured on the slavenode, and the slave acts as that master. Once the slavenode is acting as a master, it remains in this state and cannot takeover of another failed master until the original masternode is restored, and the original slave transitions back to the slave state. Example configuration masternodeA staticip:10.1.1.1 VIP:10.1.1.101 masternodeB staticip:10.1.1.2 VIP: 10.1.1.102 slavenode staticip:10.1.1.3 If masternodeA fails, slavenode becomes active as masternodeA and is configured with VIP 10.1.1.101 If masternodeB fails, there is no failover available since slavenode is currently acting as masternodeA When masternodeA is restored, slavenode releases VIP 10.1.1.101, and is now ready to take over for either masternodeA or masternodeB. I understand this is not an ideal fail over solution, but one I must live with until further design can be done. I've searched the internet, and the HA mailing lists without much success. Any info, or input would be appreciated. Best Regards, Andy ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Error
On Fri, Aug 3, 2012 at 5:18 PM, Yount, William D yount.will...@menloworldwide.com wrote: I am using pacemaker and corosync. For some reason I keep getting this error in my messages log: ERROR: Cannot chdir to [/var/lib/heartbeat/cores/root]: No such file or directory Should I not worry about that since I am using corosync and not heartbeat Pacemaker (until a few days ago) used these directories even when used with corosync. Best to create it. William ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Error [Solved]
More recent versions will create the leaf directory for you when pacemaker starts. On Fri, Aug 3, 2012 at 5:39 PM, Yount, William D yount.will...@menloworldwide.com wrote: I was able to fix the error by creating the directory manually. /var/lib/heartbeat/cores was already there, I just added root. Kind of an odd problem though. -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Yount, William D Sent: Friday, August 03, 2012 2:18 AM To: linux-ha@lists.linux-ha.org Subject: [Linux-HA] Heartbeat Error I am using pacemaker and corosync. For some reason I keep getting this error in my messages log: ERROR: Cannot chdir to [/var/lib/heartbeat/cores/root]: No such file or directory Should I not worry about that since I am using corosync and not heartbeat William ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Error [Solved]
I was able to fix the error by creating the directory manually. /var/lib/heartbeat/cores was already there, I just added root. Kind of an odd problem though. -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Yount, William D Sent: Friday, August 03, 2012 2:18 AM To: linux-ha@lists.linux-ha.org Subject: [Linux-HA] Heartbeat Error I am using pacemaker and corosync. For some reason I keep getting this error in my messages log: ERROR: Cannot chdir to [/var/lib/heartbeat/cores/root]: No such file or directory Should I not worry about that since I am using corosync and not heartbeat William ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat isn't switching to the 2nd node when Httpd is down!
On Tue, Jul 24, 2012 at 04:01:40PM +0100, Aboubakr Seddik Ouahabi wrote: Hey there, I've created a thread somewhere, but I guess this is the right place to seek help for this, and here is my issue as stated there: Ok guys, that was very much appreciated and I thank you again. For now, I just want to get heartbeat to function as it should and I don't want to create a whole new thread for it. As I said before, I have one public IP to access the server, and 2 nodes with 2 internal IPs, both are connected using Eth0, and what I want exactly is, if either one of Httpd or MySQL went down, the second node should take control and the virtual IP shall be assigned to it, until everything is in Sync again, then the primary or the favorible node should be taking over again. Heartbeat is starting just fine, detecting the 2 nodes, then I tried to shutdown one of them and see what would it say Code: cl_status nodestatus node02 dead And it found it was dead, but the failover isn't happening. I've tried to: Code: service httpd stop On node01, but it didn't switch anything to anything, so what I've been missing in my config? And here are the config I've tried in my ha.cf: Code: # Logging debug 1 use_logd true logfacilitydaemon # Misc Options traditional_compressionoff compressionbz2 coredumps true # Communications udpport21xxx bcast eth0 ucast eth010.25.45.81 ucast eth010.25.45.82 autojoin any # Thresholds (in seconds) keepalive 1 warntime 6 deadtime 10 initdead 15 crm respawn node node01 node node02 And I've tried 2 combinations for my cib.xml: learn to use the crm shell, so much easier to the eyes... 1: Code: cib configuration I think you are missing no-quorum-policy=ignore -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat isn't switching to the 2nd node when Httpd is down!
On Wed, Jul 25, 2012 at 1:01 AM, Aboubakr Seddik Ouahabi ouaha...@gmail.com wrote: Hey there, I've created a thread somewhere, but I guess this is the right place to seek help for this, and here is my issue as stated there: Ok guys, that was very much appreciated and I thank you again. For now, I just want to get heartbeat to function as it should and I don't want to create a whole new thread for it. As I said before, I have one public IP to access the server, and 2 nodes with 2 internal IPs, both are connected using Eth0, and what I want exactly is, if either one of Httpd or MySQL went down, the second node should take control and the virtual IP shall be assigned to it, Is apache and mysql intended to be running on both machines at the same time? Btw. haresources is not used for crm/pacemaker clusters until everything is in Sync again, then the primary or the favorible node should be taking over again. Heartbeat is starting just fine, detecting the 2 nodes, then I tried to shutdown one of them and see what would it say Code: cl_status nodestatus node02 dead And it found it was dead, but the failover isn't happening. I've tried to: Code: service httpd stop On node01, but it didn't switch anything to anything, so what I've been missing in my config? And here are the config I've tried in my ha.cf: Code: # Logging debug 1 use_logd true logfacilitydaemon # Misc Options traditional_compressionoff compressionbz2 coredumps true # Communications udpport21xxx bcast eth0 ucast eth010.25.45.81 ucast eth010.25.45.82 autojoin any # Thresholds (in seconds) keepalive 1 warntime 6 deadtime 10 initdead 15 crm respawn node node01 node node02 And I've tried 2 combinations for my cib.xml: 1: Code: cib configuration crm_config/ nodes/ resources ### ### group id=group_apache primitive id=ipaddr class=ocf type=IPaddr provider=heartbeat instance_attributes id=ia_ipaddr attributes nvpair id=ia_ipaddr_ip name=ip value=91.xxx.xxx.xx/ nvpair id=ia_ipaddr_nic name=nic value=eth0/ nvpair id=ia_ipaddr_netmask name=netmask value=24/ /attributes /instance_attributes /primitive primitive id=apache class=ocf type=apache provider=heartbeat instance_attributes id=ia_apache attributes nvpair id=ia_apache_configfile name=configfile value=/etc/httpd/conf/httpd.conf/ /attributes /instance_attributes /primitive /group # # group id=node01 primitive class=ocf id=IP1 provider=heartbeat type=IPaddr operations op id=IP1_mon interval=10s name=monitor timeout=5s/ /operations instance_attributes id=IP1_inst_attr attributes nvpair id=IP1_attr_0 name=ip value=10.25.45.81/ nvpair id=IP1_attr_1 name=netmask value=255.255.255.0/ nvpair id=IP1_attr_2 name=nic value=eth0/ /attributes /instance_attributes /primitive primitive class=lsb id=httpd1 provider=heartbeat type=httpd operations op id=jboss1_mon interval=30s name=monitor timeout=20s/ /operations /primitive /group group id=node02 primitive class=ocf id=IP2 provider=heartbeat type=IPaddr operations op id=IP2_mon interval=10s name=monitor timeout=5s/ /operations instance_attributes id=IP2_inst_attr attributes nvpair id=IP2_attr_0 name=ip value=10.25.45.82/ nvpair id=IP2_attr_1 name=netmask value=255.255.255.0/ nvpair id=IP2_attr_2 name=nic value=eth0/ /attributes /instance_attributes /primitive primitive class=lsb id=httpd2 provider=heartbeat type=httpd operations op id=jboss2_mon interval=30s name=monitor timeout=20s/ /operations /primitive /group /resources constraints rsc_location id=location_server1 rsc=node01 rule id=best_location_server1 score=100 expression_attribute=node01 id=best_location_server1_expr operation=eq value=10.25.45.81/ /rule /rsc_location rsc_location id=location_server2 rsc=node02 rule id=best_location_server2 score=100 expression_attribute=node02 id=best_location_server2_expr operation=eq
Re: [Linux-HA] Heartbeat over VPN
Hi, On Wed, Jul 11, 2012 at 04:24:42AM +0700, Nanang Purnomo wrote: I want to implement a failover cluster server with heartbeat, but the problem I use vpn network. Is the heartbeat can be run through two different networks? Sure. Just make sure that the port is open and that various parameters fit your network. Now, if it's a two-node cluster, you need a stonith solution which runs over another independent media. If that's not possible, you'll need an arbitrator in the third site. Thanks, Dejan I hope you give me solution,please Best Regards, Nanang ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat question about multiple services
Am Freitag, 20. April 2012 12:42:16 schrieb sgm: Hi, I have a question about heartbeat, if I have three services, apache, mysql and sendmail,if apache is down, heartbeat will switch all the services to the standby server, right? It's depending on configuration - also possible ... If so, how to configure heartbeat to avoid this happen? You can configure your 2 services (mysql and sendmail for example ) with colocations constraints, or as a group - there are many possibilities. Did you already RTFM (read the f... manuals)? Very Appreciated.gm ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems HTH Nikita Michalko ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat question about multiple services
Il giorno Ven 20 Apr 2012 12:42:16 CEST, sgm ha scritto: Hi, I have a question about heartbeat, if I have three services, apache, mysql and sendmail,if apache is down, heartbeat will switch all the services to the standby server, right? If so, how to configure heartbeat to avoid this happen? Very Appreciated.gm You may want to start from here: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/ -- RaSca Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene! ra...@miamammausalinux.org http://www.miamammausalinux.org ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat question about multiple services
On 4/20/2012 at 05:42 AM, sgm sgm...@yahoo.com.cn wrote: Hi, I have a question about heartbeat, if I have three services, apache, mysql and sendmail,if apache is down, heartbeat will switch all the services to the standby server, right? Maybe. It depends on how you have built and configured your cluster. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat strange behavior
Thanks Lars.. problem solved. I changed the asterisk init script to be idempotent. Regards, Douglas On Wed, May 2, 2012 at 9:25 AM, Lars Ellenberg lars.ellenb...@linbit.comwrote: On Mon, Apr 30, 2012 at 01:52:05PM -0300, Douglas Pasqua wrote: Hi friends, I create a linux ha solution using 2 nodes: node-a and node-b. My /etc/ha.d/ha.cf: use_logd yes keepalive 1 deadtime 90 warntime 5 initdead 120 bcast eth6 node node-a node node-b crm off auto_failback off My /etc/ha.d/haresources node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 service1 service2 service3 I booted the two nodes together. node-a become master and node-b become slave. After, I booted the node-a. Then node-b become master. When node-a return from boot, it become slave, because *auto_failback is off* i think. All as expected until here. As the node-a as a slave, I decide to halt the node-a (using halt command). Then heartbeat in node-b go standby and my cluster was down. The virtual ips was down too. I expected the node-b stay on. Why did this happen ? Some log from node2: Apr 30 00:02:57 node-b heartbeat: [3082]: info: Received shutdown notice from 'node-a'. Apr 30 00:02:57 node-b heartbeat: [3082]: info: Resources being acquired from node-a. Apr 30 00:02:57 node-b heartbeat: [4414]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Apr 30 00:02:57 node-b harc[4414]: [4428]: info: Running /etc/ha.d/rc.d/status status Apr 30 00:02:57 node-b heartbeat: [4416]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys node-b] to acquire. Apr 30 00:02:57 node-b heartbeat: [3082]: debug: StartNextRemoteRscReq(): child count 1 Apr 30 00:02:58 node-b ResourceManager[4462]: [4657]: debug: /etc/init.d/asterisk start done. RC=1 Apr 30 00:02:58 node-b ResourceManager[4462]: [4658]: ERROR: Return code 1 from /etc/init.d/asterisk Apr 30 00:02:58 node-b ResourceManager[4462]: [4659]: CRIT: Giving up resources due to failure of asterisk Because of the above error when starting asterisk. Maybe your asterisk init script is simply not idempotent. Maybe it is broken, or maybe there really was some problem trying to start asterisk. Apr 30 00:02:58 node-b ResourceManager[4462]: [4660]: info: Releasing resource group: node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 asterisk sincronismo notificacao Apr 30 00:02:58 node-b ResourceManager[4462]: [4670]: info: Running /etc/init.d/notificacao stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4671]: debug: Starting /etc/init.d/notificacao stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4694]: debug: /etc/init.d/notificacao stop done. RC=0 Apr 30 00:02:58 node-b ResourceManager[4462]: [4704]: info: Running /etc/init.d/sincronismo stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4705]: debug: Starting /etc/init.d/sincronismo stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4711]: debug: /etc/init.d/sincronismo stop done. RC=0 Apr 30 00:02:58 node-b ResourceManager[4462]: [4720]: info: Running /etc/init.d/asterisk stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4721]: debug: Starting /etc/init.d/asterisk stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4725]: debug: /etc/init.d/asterisk stop done. RC=0 Apr 30 00:02:58 node-b ResourceManager[4462]: [4741]: info: Running /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4742]: debug: Starting /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop Apr 30 00:03:29 node-b heartbeat: [3082]: info: node-b wants to go standby [foreign] Apr 30 00:03:39 node-b heartbeat: [3082]: WARN: No reply to standby request. Standby request cancelled. Apr 30 00:04:29 node-b heartbeat: [3082]: WARN: node node-a: is dead Apr 30 00:04:29 node-b heartbeat: [3082]: info: Dead node node-a gave up resources. Apr 30 00:04:29 node-b heartbeat: [3082]: info: Link node-a:eth6 dead. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat strange behavior
On Mon, Apr 30, 2012 at 01:52:05PM -0300, Douglas Pasqua wrote: Hi friends, I create a linux ha solution using 2 nodes: node-a and node-b. My /etc/ha.d/ha.cf: use_logd yes keepalive 1 deadtime 90 warntime 5 initdead 120 bcast eth6 node node-a node node-b crm off auto_failback off My /etc/ha.d/haresources node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 service1 service2 service3 I booted the two nodes together. node-a become master and node-b become slave. After, I booted the node-a. Then node-b become master. When node-a return from boot, it become slave, because *auto_failback is off* i think. All as expected until here. As the node-a as a slave, I decide to halt the node-a (using halt command). Then heartbeat in node-b go standby and my cluster was down. The virtual ips was down too. I expected the node-b stay on. Why did this happen ? Some log from node2: Apr 30 00:02:57 node-b heartbeat: [3082]: info: Received shutdown notice from 'node-a'. Apr 30 00:02:57 node-b heartbeat: [3082]: info: Resources being acquired from node-a. Apr 30 00:02:57 node-b heartbeat: [4414]: debug: notify_world: setting SIGCHLD Handler to SIG_DFL Apr 30 00:02:57 node-b harc[4414]: [4428]: info: Running /etc/ha.d/rc.d/status status Apr 30 00:02:57 node-b heartbeat: [4416]: info: No local resources [/usr/share/heartbeat/ResourceManager listkeys node-b] to acquire. Apr 30 00:02:57 node-b heartbeat: [3082]: debug: StartNextRemoteRscReq(): child count 1 Apr 30 00:02:58 node-b ResourceManager[4462]: [4657]: debug: /etc/init.d/asterisk start done. RC=1 Apr 30 00:02:58 node-b ResourceManager[4462]: [4658]: ERROR: Return code 1 from /etc/init.d/asterisk Apr 30 00:02:58 node-b ResourceManager[4462]: [4659]: CRIT: Giving up resources due to failure of asterisk Because of the above error when starting asterisk. Maybe your asterisk init script is simply not idempotent. Maybe it is broken, or maybe there really was some problem trying to start asterisk. Apr 30 00:02:58 node-b ResourceManager[4462]: [4660]: info: Releasing resource group: node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 asterisk sincronismo notificacao Apr 30 00:02:58 node-b ResourceManager[4462]: [4670]: info: Running /etc/init.d/notificacao stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4671]: debug: Starting /etc/init.d/notificacao stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4694]: debug: /etc/init.d/notificacao stop done. RC=0 Apr 30 00:02:58 node-b ResourceManager[4462]: [4704]: info: Running /etc/init.d/sincronismo stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4705]: debug: Starting /etc/init.d/sincronismo stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4711]: debug: /etc/init.d/sincronismo stop done. RC=0 Apr 30 00:02:58 node-b ResourceManager[4462]: [4720]: info: Running /etc/init.d/asterisk stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4721]: debug: Starting /etc/init.d/asterisk stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4725]: debug: /etc/init.d/asterisk stop done. RC=0 Apr 30 00:02:58 node-b ResourceManager[4462]: [4741]: info: Running /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop Apr 30 00:02:58 node-b ResourceManager[4462]: [4742]: debug: Starting /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop Apr 30 00:03:29 node-b heartbeat: [3082]: info: node-b wants to go standby [foreign] Apr 30 00:03:39 node-b heartbeat: [3082]: WARN: No reply to standby request. Standby request cancelled. Apr 30 00:04:29 node-b heartbeat: [3082]: WARN: node node-a: is dead Apr 30 00:04:29 node-b heartbeat: [3082]: info: Dead node node-a gave up resources. Apr 30 00:04:29 node-b heartbeat: [3082]: info: Link node-a:eth6 dead. -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Failover Configuration Question
On 23 Apr 2012, at 02:23, Net Warrior wrote: auto_failback on No. As far as I'm aware this is to control what happens when your initial node recovers. If you have 2 nodes, a and b, and a is active, but then fails, b will take over, but when a is fixed and recovers, heartbeat will 'fail back' to a automatically if this property is on. You might want this if a is a faster/better server. Marcus -- Marcus Bointon Synchromedia Limited: Creators of http://www.smartmessages.net/ UK info@hand CRM solutions mar...@synchromedia.co.uk | http://www.synchromedia.co.uk/ ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Failover Configuration Question
Hi, Net Warrior! What version of HA/Pacemaker do you use? Did you already RTFM - e.g. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained - or: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch HTH Nikita Michalko Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior: Hi There I configured heartbeat to failover an IP address , if I for example shutdown one node, the other takes it's ip address, so far so good, now my doubt is if there is a way to configure it not to make the failover automatically and have someone run the failover manually, can you provide any configuration example please? is this stanza the one that does the magic? auto_failback on Thanks for your time and support Best regards ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Failover Configuration Question
Hi Nikita This is the version heartbeat-3.0.0-0.7 My aim is to, if node1 is powered off or losts it's ethernet connection,. node2 wont make the failover automatically, I want to make it manually, but could not find how to accomplish that. Thanks for your time and support Best regards 2012/4/23, Nikita Michalko michalko.sys...@a-i-p.com: Hi, Net Warrior! What version of HA/Pacemaker do you use? Did you already RTFM - e.g. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained - or: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch HTH Nikita Michalko Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior: Hi There I configured heartbeat to failover an IP address , if I for example shutdown one node, the other takes it's ip address, so far so good, now my doubt is if there is a way to configure it not to make the failover automatically and have someone run the failover manually, can you provide any configuration example please? is this stanza the one that does the magic? auto_failback on Thanks for your time and support Best regards ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Failover Configuration Question
Why even use heartbeat then - Just manually ifconfig the interface. On 4/23/12 7:39 AM, Net Warrior wrote: Hi Nikita This is the version heartbeat-3.0.0-0.7 My aim is to, if node1 is powered off or losts it's ethernet connection,. node2 wont make the failover automatically, I want to make it manually, but could not find how to accomplish that. Thanks for your time and support Best regards 2012/4/23, Nikita Michalkomichalko.sys...@a-i-p.com: Hi, Net Warrior! What version of HA/Pacemaker do you use? Did you already RTFM - e.g. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained - or: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch HTH Nikita Michalko Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior: Hi There I configured heartbeat to failover an IP address , if I for example shutdown one node, the other takes it's ip address, so far so good, now my doubt is if there is a way to configure it not to make the failover automatically and have someone run the failover manually, can you provide any configuration example please? is this stanza the one that does the magic? auto_failback on Thanks for your time and support Best regards ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Failover Configuration Question
True, but even on the most expensive software likve Veritas Cluster or Red Hat Cluster I can configure how I want to failover the resources ( auto or manual ), that's why my curiosity to acomplish the same in here. Thanks for your time Best Regards 2012/4/23, David Coulson da...@davidcoulson.net: Why even use heartbeat then - Just manually ifconfig the interface. On 4/23/12 7:39 AM, Net Warrior wrote: Hi Nikita This is the version heartbeat-3.0.0-0.7 My aim is to, if node1 is powered off or losts it's ethernet connection,. node2 wont make the failover automatically, I want to make it manually, but could not find how to accomplish that. Thanks for your time and support Best regards 2012/4/23, Nikita Michalkomichalko.sys...@a-i-p.com: Hi, Net Warrior! What version of HA/Pacemaker do you use? Did you already RTFM - e.g. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained - or: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch HTH Nikita Michalko Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior: Hi There I configured heartbeat to failover an IP address , if I for example shutdown one node, the other takes it's ip address, so far so good, now my doubt is if there is a way to configure it not to make the failover automatically and have someone run the failover manually, can you provide any configuration example please? is this stanza the one that does the magic? auto_failback on Thanks for your time and support Best regards ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Failover Configuration Question
On 04/23/2012 01:47 PM, Net Warrior wrote: True, but even on the most expensive software likve Veritas Cluster or Red Hat Cluster I can configure how I want to failover the resources ( auto or manual ), that's why my curiosity to acomplish the same in here. with the help of the meat-ware stonith plugin a manual acknowledge of the failover process is required. Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now Thanks for your time Best Regards 2012/4/23, David Coulson da...@davidcoulson.net: Why even use heartbeat then - Just manually ifconfig the interface. On 4/23/12 7:39 AM, Net Warrior wrote: Hi Nikita This is the version heartbeat-3.0.0-0.7 My aim is to, if node1 is powered off or losts it's ethernet connection,. node2 wont make the failover automatically, I want to make it manually, but could not find how to accomplish that. Thanks for your time and support Best regards 2012/4/23, Nikita Michalkomichalko.sys...@a-i-p.com: Hi, Net Warrior! What version of HA/Pacemaker do you use? Did you already RTFM - e.g. http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained - or: http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch HTH Nikita Michalko Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior: Hi There I configured heartbeat to failover an IP address , if I for example shutdown one node, the other takes it's ip address, so far so good, now my doubt is if there is a way to configure it not to make the failover automatically and have someone run the failover manually, can you provide any configuration example please? is this stanza the one that does the magic? auto_failback on Thanks for your time and support Best regards ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat doesnt create the socket /var/run/heartbeat/register
The heartbeat I install is from debian packages. dpkg -l | grep heartbeat ii heartbeat 1:3.0.3-2~bpo50+1 Subsystem for High-Availability Linux ii libheartbeat2 1:3.0.3-2~bpo50+1 Subsystem for High-Availability Linux (libraries) version 3.0.2 I install the same packages and builds on all devices. I have an automatic installation. Some devices are installed ok and some suffers from the problem that the socket isn't created. Is there a way I can create the socket from outside heartbeat (from perl or bash)? I have a watchdog and I wish to create the socket automatically in case the socket doesn't exist. -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Lars Ellenberg Sent: Friday, January 20, 2012 8:48 PM To: linux-ha@lists.linux-ha.org Subject: Re: [Linux-HA] heartbeat doesnt create the socket /var/run/heartbeat/register On Thu, Jan 19, 2012 at 02:18:53PM +, Efrat Lefeber wrote: Hi, I am using linux-ha heartbeat on a two simple nodes cluster. For some reason which I can't figure out, the socket /var/run/heartbeat/register is not created though the directory /var/run/heartbeat/ exist: ll /var/run/heartbeat/ total 24 drwxr-x--- 6 hacluster haclient 4096 2012-01-19 14:30 . drwxr-xr-x 16 root root 4096 2012-01-19 14:30 .. drwxr-x--- 2 hacluster haclient 4096 2012-01-19 14:30 ccm drwxr-x--- 2 hacluster haclient 4096 2012-01-19 14:30 crm drwxr-x--- 2 hacluster haclient 4096 2012-01-19 14:30 dopd drwxr-xr-t 2 root root 4096 2012-01-19 14:30 rsctmp /etc/init.d/heartbeat status heartbeat OK [pid 14685 et al] is running on vs-158 [vs-158]... cl_status hbstatus Heartbeat is stopped on this machine. I ran cl_status with strace and I saw this error: connect(3, {sa_family=AF_FILE, path=/var/run/heartbeat/register...}, 110) = -1 ENOENT (No such file or directory) Who created this socket? That's one of the first things the heartbeat binary does when it starts, If it can not create that socket, heartbeat will not even start up. Of course, in theory someone may remove that socket after it was created. If so, make sure that does not happen again ;) How can I find out why isn't the socket created? Where did you get your packages/binaries? Double check your build? lsof -n -p your heartbeat master control process? Is there a workaround I can do to create the socket? Fix your installation. This problem doesn't happen all the time. I have another node with the same configuration and the socket was created there. Same packages and build? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems # Scanned by MailMarshal - M86 Security's comprehensive email content security solution. Download a free evaluation of MailMarshal at www.m86security.com # ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat doesnt create the socket /var/run/heartbeat/register
On Thu, Jan 19, 2012 at 02:18:53PM +, Efrat Lefeber wrote: Hi, I am using linux-ha heartbeat on a two simple nodes cluster. For some reason which I can't figure out, the socket /var/run/heartbeat/register is not created though the directory /var/run/heartbeat/ exist: ll /var/run/heartbeat/ total 24 drwxr-x--- 6 hacluster haclient 4096 2012-01-19 14:30 . drwxr-xr-x 16 root root 4096 2012-01-19 14:30 .. drwxr-x--- 2 hacluster haclient 4096 2012-01-19 14:30 ccm drwxr-x--- 2 hacluster haclient 4096 2012-01-19 14:30 crm drwxr-x--- 2 hacluster haclient 4096 2012-01-19 14:30 dopd drwxr-xr-t 2 root root 4096 2012-01-19 14:30 rsctmp /etc/init.d/heartbeat status heartbeat OK [pid 14685 et al] is running on vs-158 [vs-158]... cl_status hbstatus Heartbeat is stopped on this machine. I ran cl_status with strace and I saw this error: connect(3, {sa_family=AF_FILE, path=/var/run/heartbeat/register...}, 110) = -1 ENOENT (No such file or directory) Who created this socket? That's one of the first things the heartbeat binary does when it starts, If it can not create that socket, heartbeat will not even start up. Of course, in theory someone may remove that socket after it was created. If so, make sure that does not happen again ;) How can I find out why isn't the socket created? Where did you get your packages/binaries? Double check your build? lsof -n -p your heartbeat master control process? Is there a workaround I can do to create the socket? Fix your installation. This problem doesn't happen all the time. I have another node with the same configuration and the socket was created there. Same packages and build? -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] [Heartbeat][Pacemaker] VIP doesn't swith to other server
Hello Mathieu, On 11/17/2011 07:22 PM, SEILLIER Mathieu wrote: Hi all, I have to use Heartbeat with Pacemaker for High Availability between 2 Tomcat 5.5 servers under Linux RedHat 5.4. The first server is active, the other one is passive. The master is called servappli01, with IP address 186.20.100.81, the slave is called servappli02, with IP address 186.20.100.82. I configured a virtual IP 186.20.100.83. Each Tomcat is not launched when server is started, this is Heartbeat which starts Tomcat when it's running. All seem to be OK, each server see the other as active, and the crm_mon command shows this below : Last updated: Thu Nov 17 19:03:34 2011 Stack: Heartbeat Current DC: servappli01 (bf8e9a46-8691-4838-82d9-942a13aeedca) - partition with quorum Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87 2 Nodes configured, 2 expected votes 2 Resources configured. Online: [ servappli01 servappli02 ] Clone Set: ClusterIPClone (unique) ClusterIP:0(ocf::heartbeat:IPaddr2): Started servappli01 ClusterIP:1(ocf::heartbeat:IPaddr2): Started servappli02 Your did not only configured a simple VIP but a cluster IP which acts like a simple static loadbalancer ... man iptables ... search for CLUSTERIP. If this was not your intention, simply don't clone it. If you want a clusterip you have to choose correct meta attributes: clone ClusterIPClone ClusterIP \ meta globally-unique=true clone-node-max=2 interleave=true Clone Set: TomcatClone (unique) Tomcat:0 (ocf::heartbeat:tomcat):Started servappli01 Tomcat:1 (ocf::heartbeat:tomcat):Started servappli02 The 2 Tomcat servers as identical, and the same webapps are deployed on each server in order to be able to access webapps on the other server if one is down. By default, requests from clients are processed by the first server because it's the master. My problem is that when I crash the Tomcat on the first server, requests from clients are not redirected to the second server. For a while, requests are not processed, then Heartbeat restarts Tomcat itself and requests are processed again by the first server. Requests are never forwarded to the second Tomcat if the first is down. Default behavior on monitoring errors is a local restart. If you always test from the same IP I would expect your requests to fail while Tomcat is not running on the one node you are redirected ... so if you choose the clusterip_hash sourceip-sourceport your chance should be 50/50 to get redirected ... if you want a real loadbalancer you might want to integrate a service likde ldirectord with realserver checks to remove a non-working service from the loadbalancing. ... use ip addr show or define a label to see your VIP ... Regards, Andreas -- Need help with Pacemaker? http://www.hastexo.com/now Here is my configuration : ha.cf file (the same on each server) : crm respawn logfacility local0 logfile /var/log/ha-log debugfile /var/log/ha-debug warntime10 deadtime20 initdead120 keepalive 2 autojoinnone nodeservappli01 nodeservappli02 ucast eth0 186.20.100.81 # ignored by node1 (owner of ip) ucast eth0 186.20.100.82 # ignored by node2 (owner of ip) cib.xml file (the same on each server) : ?xml version=1.0 ? cib admin_epoch=0 crm_feature_set=3.0.1 dc-uuid=bf8e9a46-8691-4838-82d9-942a13aeedca epoch=127 have-quorum=1 num_updates=51 validate-with=pacemaker-1.0 configuration crm_config cluster_property_set id=cib-bootstrap-options nvpair id=cib-bootstrap-options-dc-version name=dc-version value=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87/ nvpair id=cib-bootstrap-options-cluster-infrastructure name=cluster-infrastructure value=Heartbeat/ nvpair id=cib-bootstrap-options-expected-quorum-votes name=expected-quorum-votes value=2/ nvpair id=cib-bootstrap-options-no-quorum-policy name=no-quorum-policy value=ignore/ nvpair id=cib-bootstrap-options-stonith-enabled name=stonith-enabled value=false/ /cluster_property_set /crm_config nodes node id=489a0305-862a-4280-bce5-6defa329df3f type=normal uname=servappli01/ node id=bf8e9a46-8691-4838-82d9-942a13aeedca type=normal uname=servappli02/ /nodes resources clone id=TomcatClone meta_attributes id=TomcatClone-meta_attributes nvpair id=TomcatClone-meta_attributes-globally-unique name=globally-unique value=true/ /meta_attributes primitive class=ocf id=Tomcat provider=heartbeat type=tomcat instance_attributes id=Tomcat-instance_attributes nvpair id=Tomcat-instance_attributes-tomcat_name name=tomcat_name value=TomcatSBNG/ nvpair
Re: [Linux-HA] heartbeat and squid
Hi, On Thu, Sep 01, 2011 at 06:30:46PM +0200, Nicolas Repentin wrote: Hi all, I've got a question for heartbeat. How can I made this : If squid stop or be killed on node1, how make node2 be master ? Actually, node2 become master only when node1 is down, or heartbeat service on node1 is down, but if I kill squid, nothing happen. I'm using Centos 6 and last heartbeat version. Using just heartbeat and no pacemaker? Only pacemaker has service monitoring. Thanks, Dejan Thanks a lot for your responses ! -- Nicolas ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Restart is not same as Stop and Start
Mike, I checked the permission and those are fine. If you can please check the restart script I have given below, it does not touch the heartbeat lock file *touch $LOCKDIR/$SUBSYS* when the heartbeat is restared and I guess it is a problem. Is it not? Btw, we have a product for some web application and as part of it we allow Administrators to configure servers as redundant server and under lying we use linux-ha to set up redundant servers. Rahul ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat Restart is not same as Stop and Start
Permission problem perhaps? Not really sure what you're doing but the fact that you have users configuring the cluster (why do you do this btw?) may be pointing to a permission issue. -mgb On 11-08-03 06:57 PM, Rahul Kanna wrote: Hi, Our system setup: Heartbeat 3.0.3 DRBD (to manage file system and it is one of the resource managed by CRM) Redhat Linux Pacemaker We have built an application on top of Linux-HA for users to configure cluster by giving IP addresses of the nodes, do operations like Restart system, Change host names, Resolve split-brain scenario etc. In our application, we ran into problem when we do heartbeat restart for some operation and then when user does Restart System which internally runs the command shutdown -r now. I believe this due to heartbeat lsb script and I have explained the scenario below. Problem: In the heartbeat lsb script, restart does not remove and touches the heartbeat lock file. On, heartbeat start, the lsb script starts heartbeat and touches /var/lock/subsys/heartbeat lock file. On, heartbeat stop, the lsb script stops heartbeat and removes the lock file at /var/lock/subsys/heartbeat. On, heartbeat restart, the lsb script stops heartbeat and starts heartbeat. But DOES NOT remove or touches the lock file. We call heartbeat restart instead of heartbeat start through our script because we are not sure whether heartbeat is already running or not. So when heartbeat restart is called when heartbeat is NOT running, heartbeat lsb script tries to stop but its not running so it just starts heartbeat BUT after starting, heartbeat lock file is not touched (because of restart in heartbeat lsb). So now, in the system heartbeat is running (can verify this by looking for heartbeat process or heartbeat status command) but there is no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux kernal to know what all process it has to stop when it shuts down (shutdown -r now). When we run shutdown -r now, Linux kernal thinks heartbeat is not running (because there is no lock file) and does not stop heartbeat properly. When it comes back up, heartbeat is started but heartbeat state is not correct (because it was not stopped properly). Due to this, this node is identifies as Primary though the erstwhile Secondary node has become Primary now and this causes split-brain. So I believe, heartbeat restart should do exactly as heartbeat stop and heartbeat start which is not the case now. Can you please let me know if my understanding is correct and it is a bug in Heartbeat lsb script? Thanks for looking into it. I have given below the relevant code from heartbeat lsb script as well File: /etc/init.d/heartbeat start) RunStartStop pre-start StartHA RC=$? echo if [ $RC -eq 0 ] then [ ! -d $LOCKDIR ] mkdir -p $LOCKDIR touch $LOCKDIR/$SUBSYS fi RunStartStop post-start $RC ;; stop) RunStartStop pre-stop StopHA RC=$? echo if [ $RC -eq 0 ] then rm -f $LOCKDIR/$SUBSYS fi RunStartStop post-stop $RC ;; restart) sleeptime=`ha_parameter deadtime` StopHA echo echo -n Waiting to allow resource takeover to complete: sleep $sleeptime sleep 10 # allow resource takeover to complete (hopefully). echo_success echo StartHA echo ;; ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat 3.0.3 stable version + RHEL 6.1: restart network will make heartbeat not send broadcasts
Hi: I'm using Heartbeat 3.0.3 stable version on RHEL 6.1 x64 platform, and found following issue: If I restart network service, heartbeat will not send broadcast packages from port 694. That makes this node never have a chance to join HA cluster again except restart it. Details for setting cluster: 1. Compile heartbeat 3.0.3 from source and install it on 2 RHEL 6.1 x64 nodes: installer001 and rhel61 2. Compile pacemaker 1.0.9 from source and install it on 2 RHEL 6.1 x64 nodes 3. Configure /etc/ha.d/ha.cf, make sure both of these 2 nodes are Online through crm status 4. run tcpdump -i eth0 port 694, we can found both of these 2 nodes are sending heartbeat broadcast packages. Details of configuration file: = [root@rhel61 ~]# cat /etc/ha.d/ha.cf autojoin none bcast eth0 warntime 5 deadtime 15 initdead 60 keepalive 2 node installer001 node rhel61 crm respawn Then I tried to restart network service on the backup node installer001, or just run ifdown eth0; ifup eth0. And on node rhel61 it will detected installer001 as offline immediately. On node installer001, it will detect rhel61 as offline. Then I run tcpdump -i eth0 port 694 on installer001 again, we can only detect rhel61 still sending broadcast packages but no broadcast packages coming from installer001, although eth0 network is fully recovered now. I've tried the exactly same case on RHEL 5.6 (heartbeat 3.0.3), it works well. After restart network, the node can still send out broadcast packages... Thanks for you comments. --Lei ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat three node configuration
On Thu, Jun 9, 2011 at 11:54 PM, Ricardo F ri...@hotmail.com wrote: What is the configuration for create a three node cluster?, Essentially you need Pacemaker on top. haresources based clusters were only designed for 2-nodes. i have this but the servers bring-up the shared ip at same time: ha.cflogfacility local0keepalive 2deadtime 10warntime 5initdead 30auto_failback offucast bond0 host1 host2 host3node host1node host2node host3 haresourceshost1 192.168.1.10/24/bond0 i use heartbeat 3.0.3 in a debian squeeze in all of the nodes, all of them have in the /etc/hosts the others ips and i can propagate the conf with ha_propagate. Thanks ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat step down after split brain scenario
Hi - thanks for the response. Dimitri Maziuk wrote: What do you mean by disconnecting: what's your failure scenario and how do you expect it to be handled? The disconnection is the loss of the intersite link which interrupts heartbeat comms. In this case it's expected that both sites will acquire the resources and become active. However, what I want to happen is that one of the sites will give up the resources again when it sees that the other site is up again. Dimitri Maziuk wrote: Running daemons are not guaranteed (arguably, expected) to notice when the network cable is unplugged. You have to monitor the link and restart all processes that bind()/listen() on the interface. If your nodes are at different sites, you need to also deal with the loss of link at the switch, gateway, etc., and figure out which one is still connected to the Internet -- and gets to keep the VIP. Which in general can't be done from the nodes themselves. Yes - in this case neither site has to be connected to the internet, this is more an internal load balancing act between two connected sites in a customers network. What I found is that by setting auto_failback on in ha.cf at both sites the site/node listed in haresources will keep the resources when the link is re-established and the other site will release the resources. This is the result I was looking for. Regards Jack -- View this message in context: http://old.nabble.com/heartbeat-step-down-after-split-brain-scenario-tp31858728p31884521.html Sent from the Linux-HA mailing list archive at Nabble.com. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat step down after split brain scenario
On 06/16/2011 04:28 AM, Jack Berg wrote: I have a two node cluster using heartbeat and haproxy. Unfortunately it is impossible to provide redundant heartbeat paths between the two nodes at different sites so it is possible for a failure to cause split brain. To evaluate the impact I tried disconnecting the two nodes and I found that both become active and both try to keep the VIPs after the link is restored. What do you mean by disconnecting: what's your failure scenario and how do you expect it to be handled? Running daemons are not guaranteed (arguably, expected) to notice when the network cable is unplugged. You have to monitor the link and restart all processes that bind()/listen() on the interface. If your nodes are at different sites, you need to also deal with the loss of link at the switch, gateway, etc., and figure out which one is still connected to the Internet -- and gets to keep the VIP. Which in general can't be done from the nodes themselves. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat sends udp to whole network
Hi, On Mon, May 23, 2011 at 03:18:37PM -0700, Nulgor Wankevitch wrote: hi, heartbeat seems to be send udp on port 694 to the whole network segment, Do you use ucast or bcast? With the latter, which is broadcast it's of course expected. If it happens with the former, then you must have gremlins in your network. Thanks, Dejan not just the link host, and getting blocked by firewall, how to limit? Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:22:19:21:f1:75:08:00 SRC=192.168.1.190 DST=192.168.1.255 LEN=246 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=42414 DPT=694 LEN=226 any help thnk you, nulgor ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat sends udp to whole network
Hi, thnk for reply, when use ucast things do not seem to work, the nodes are able to bring up the VIP but not any services. When using bcast things seem to work correctly but there is that broadcast problem, I would like to firewall the broadcast and isolate it to the local machine and 2nd node however I do not want to cause additional problems, please advise, thks. nulgor On 5/24/2011 1:52 AM, Dejan Muhamedagic wrote: Hi, On Mon, May 23, 2011 at 03:18:37PM -0700, Nulgor Wankevitch wrote: hi, heartbeat seems to be send udp on port 694 to the whole network segment, Do you use ucast or bcast? With the latter, which is broadcast it's of course expected. If it happens with the former, then you must have gremlins in your network. Thanks, Dejan not just the link host, and getting blocked by firewall, how to limit? Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:22:19:21:f1:75:08:00 SRC=192.168.1.190 DST=192.168.1.255 LEN=246 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=42414 DPT=694 LEN=226 any help thnk you, nulgor ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat sends udp to whole network
Hi, On Tue, May 24, 2011 at 02:12:12AM -0700, Nulgor Wankevitch wrote: Hi, thnk for reply, when use ucast things do not seem to work, the nodes are able to bring up the VIP but not any services. When using bcast things seem to work correctly Wow! You really do have gremlins somewhere. ucast cannot not work in the way you described. Either the nodes can communicate or they can't. Did you set the right IP address of the peer? Or there must be some kind of network setup issue. Thanks, Dejan but there is that broadcast problem, I would like to firewall the broadcast and isolate it to the local machine and 2nd node however I do not want to cause additional problems, please advise, thks. nulgor On 5/24/2011 1:52 AM, Dejan Muhamedagic wrote: Hi, On Mon, May 23, 2011 at 03:18:37PM -0700, Nulgor Wankevitch wrote: hi, heartbeat seems to be send udp on port 694 to the whole network segment, Do you use ucast or bcast? With the latter, which is broadcast it's of course expected. If it happens with the former, then you must have gremlins in your network. Thanks, Dejan not just the link host, and getting blocked by firewall, how to limit? Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:22:19:21:f1:75:08:00 SRC=192.168.1.190 DST=192.168.1.255 LEN=246 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=42414 DPT=694 LEN=226 any help thnk you, nulgor ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat sends udp to whole network
ya, gremlins, very reassuring, thanks. On 5/24/2011 2:42 AM, Dejan Muhamedagic wrote: Hi, On Tue, May 24, 2011 at 02:12:12AM -0700, Nulgor Wankevitch wrote: Hi, thnk for reply, when use ucast things do not seem to work, the nodes are able to bring up the VIP but not any services. When using bcast things seem to work correctly Wow! You really do have gremlins somewhere. ucast cannot not work in the way you described. Either the nodes can communicate or they can't. Did you set the right IP address of the peer? Or there must be some kind of network setup issue. Thanks, Dejan but there is that broadcast problem, I would like to firewall the broadcast and isolate it to the local machine and 2nd node however I do not want to cause additional problems, please advise, thks. nulgor On 5/24/2011 1:52 AM, Dejan Muhamedagic wrote: Hi, On Mon, May 23, 2011 at 03:18:37PM -0700, Nulgor Wankevitch wrote: hi, heartbeat seems to be send udp on port 694 to the whole network segment, Do you use ucast or bcast? With the latter, which is broadcast it's of course expected. If it happens with the former, then you must have gremlins in your network. Thanks, Dejan not just the link host, and getting blocked by firewall, how to limit? Firewall: *UDP_IN Blocked* IN=eth0 OUT= MAC=ff:ff:ff:ff:ff:ff:00:22:19:21:f1:75:08:00 SRC=192.168.1.190 DST=192.168.1.255 LEN=246 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=42414 DPT=694 LEN=226 any help thnk you, nulgor ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat sends udp to whole network
On 05/24/2011 05:48 AM, Nulgor Wankevitch wrote: ya, gremlins, very reassuring, thanks. If the broadcast packets from host A are seen by host B, and unicast packets from host A to host B are not seen by host B, then your universe is governed by laws of physics we here are completely unfamiliar with. Sometimes we call them gremlins. HTH Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat sends udp to whole network
I think you guys might have jumped the gun on me, why would you assume it is not seen? I reported it will bring up the VIP but not the services. nulgor On 5/24/2011 9:37 AM, Dimitri Maziuk wrote: On 05/24/2011 05:48 AM, Nulgor Wankevitch wrote: ya, gremlins, very reassuring, thanks. If the broadcast packets from host A are seen by host B, and unicast packets from host A to host B are not seen by host B, then your universe is governed by laws of physics we here are completely unfamiliar with. Sometimes we call them gremlins. HTH Dima ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat sends udp to whole network
On 05/24/2011 02:56 PM, Nulgor Wankevitch wrote: I think you guys might have jumped the gun on me, why would you assume it is not seen? I reported it will bring up the VIP but not the services. The only way I can vaguely imagine that possibly happening is if cib isn't propagated to the other node(s) due to, indeed, a problem with comms channel. However, I can think of only one way to make that happen over unicast but not broadcast: unicasting to a wrong host. Dima -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat sends udp to whole network
it seems like cib is on both nodes as I am able to view both from crm_mon and crm configure show shows the same info, am I correct? On 5/24/2011 2:02 PM, Dimitri Maziuk wrote: On 05/24/2011 02:56 PM, Nulgor Wankevitch wrote: I think you guys might have jumped the gun on me, why would you assume it is not seen? I reported it will bring up the VIP but not the services. The only way I can vaguely imagine that possibly happening is if cib isn't propagated to the other node(s) due to, indeed, a problem with comms channel. However, I can think of only one way to make that happen over unicast but not broadcast: unicasting to a wrong host. Dima ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat sends udp to whole network
On Tue, May 24, 2011 at 02:10:25PM -0700, Nulgor Wankevitch wrote: it seems like cib is on both nodes as I am able to view both from crm_mon and crm configure show shows the same info, am I correct? This does not lead anywhere. You complained that broadcast broadcasts. Well, that's the nature of it. Then use unicast. But unicast does not work for me. Some talk about gremlins... Let's skip that. So. Why does unicast seem to not work for you. Maybe provide logs? E.g. a hb_report from starting up nodes configured with unicast to them bringing up some, but not all, stuff? And then we go from there. BTW, you can directly ask heartbeat what it thinks about it's comm channels: for node in $(cl_status listnodes); do for link in $(cl_status listhblinks $node); do linkstatus=$(cl_status hblinkstatus $node $link) printf %s\t%s\t%s\n $node $link $linkstatus done done We should add a pretty-print-all-known-link-states to cl_status... -- : Lars Ellenberg : LINBIT | Your Way to High Availability : DRBD/HA support and consulting http://www.linbit.com DRBD® and LINBIT® are registered trademarks of LINBIT, Austria. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Heartbeat kills itself
On 05/05/2011 11:45 AM, Lacoco, Joshua wrote: Hello, I have a 2 node cluster on RHEL 5.4. I am currently only running the heartbeat service on one node because the heartbeat service kills itself and I'm trying to avoid downtime/split brain issues. I've tried searching and I found posts that have similar problems. I am running heartbeat 3.0.2-1. Below are the same messages I am getting (from a different post). Does anyone know if this is a known issue or can point me in the right direction? I'm stumped. Hello. I had a similar problem on RHEL 5.4 with heartbeat 2.3 and heartbeat 3 (i don't remeber exact sw version)..the only think that fix this problem was to download a recent kernel version and substitute the original one. Hope this help. Andrea. -- Andrea Bertucci Aitek S.p.A. - Via della Crocetta, 15 - I-16122 Genova tel.: +39 010 846731 fax: +39 010 8467350 - e-mail: abertu...@aitek.it --- Questo messaggio di posta elettronica e gli allegati sono riservati e destinati esclusivamente alle persone cui sono indirizzati. Se avete ricevuto questo messaggio per errore, Vi preghiamo di rispedirlo al mittente o cancellarlo dal Vostro sistema. La pubblicazione, l'uso, la diffusione, l'inoltro, la stampa o la copia non autorizzati di questo messaggio e degli allegati sono vietati. This e-mail and any files transmitted with it are confidential and intended solely for the use of the individual to whom it is addressed. If you have received this email in error please send it back. Unauthorized publication, use, disclosure, forwarding, printing or copying of this email and its associated attachments is strictly prohibited. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(
On 11-04-22 06:25 AM, SEILLIER Mathieu wrote: Hi all, First I'm french so sorry in advance for my English... I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers under Linux RedHat 5.3. The first server is active, the other one is passive. The master is called servappli01, with IP address 186.20.100.40, the slave is called servappli02, with IP address 186.20.100.39. I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when server is started, this is Heartbeat which starts Tomcat when it's running. My problem is : When heartbeat is started on the first server, then on the second server, the VIP is assigned to the 2 servers ! also, Tomcat is started on each server, and each node see the other node as dead ! Here is my configuration : ha.cf file (the same on each server) : logfile /var/log/ha-log debugfile /var/log/ha-debug logfacility none keepalive 2 warntime 6 deadtime 10 initdead 90 bcast eth0 node servappli01 servappli02 auto_failback yes respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster haresources file (the same on each server) : servappli01 IPaddr::186.20.100.41/24/eth0 tomcat Result of ifconfig command on the first server (servappli01) : eth0 Link encap:Ethernet HWaddr 00:1E:0B:BB:C2:38 inet adr:186.20.100.40 Bcast:186.20.100.255 Masque:255.255.255.0 adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0 TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:385833 (3.5 GiB) TX bytes:2694953468 (2.5 GiB) Interruption:177 Memoire:fa00-fa012100 eth0:0Link encap:Ethernet HWaddr 00:1E:0B:BB:C2:38 inet adr:186.20.100.41 Bcast:186.20.100.255 Masque:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interruption:177 Memoire:fa00-fa012100 Result of ifconfig command on the second server (servappli02) at the same time : eth0 Link encap:Ethernet HWaddr 00:1E:0B:77:C9:0C inet adr:186.20.100.39 Bcast:186.20.100.255 Masque:255.255.255.0 adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0 TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:2620027933 (2.4 GiB) TX bytes:3595896739 (3.3 GiB) Interruption:177 Memoire:fa00-fa012100 eth0:0Link encap:Ethernet HWaddr 00:1E:0B:77:C9:0C inet adr:186.20.100.41 Bcast:186.20.100.255 Masque:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interruption:177 Memoire:fa00-fa012100 Result of /usr/bin/cl_status listnodes command (on each server) : servappli02 servappli01 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 : active Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 : dead Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 : dead Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 : active And of course, if I kill Tomcat on master server, there's no switch to the second server (a call to a webapp using the VIP doesn't work). Can somebody help me please ? I guess there's is something wrong but I don't know what ! Thanx Mathieu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems It almost sounds like the nodes are unaware of each other. Could be a network thing maybe. Here's some things to try: Can you ssh or ping one node from the other? Bring up one node with the VIP running - leave the other node up but heartbeat down. an you ping the VIP from the node NOT running HA? What happens when you look at the cluster when both nodes are running - use the crm_mon command and paste what you see in here. I'm thinking you have some sort of network issue. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(
Have you generated the authkey by corosync-keygen command on one node then copied that file to other node ? -Original Message- From: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of mike Sent: Tuesday, April 26, 2011 5:41 PM To: General Linux-HA mailing list Subject: Re: [Linux-HA] [Heartbeat] my VIP doesn't work :( On 11-04-22 06:25 AM, SEILLIER Mathieu wrote: Hi all, First I'm french so sorry in advance for my English... I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers under Linux RedHat 5.3. The first server is active, the other one is passive. The master is called servappli01, with IP address 186.20.100.40, the slave is called servappli02, with IP address 186.20.100.39. I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when server is started, this is Heartbeat which starts Tomcat when it's running. My problem is : When heartbeat is started on the first server, then on the second server, the VIP is assigned to the 2 servers ! also, Tomcat is started on each server, and each node see the other node as dead ! Here is my configuration : ha.cf file (the same on each server) : logfile /var/log/ha-log debugfile /var/log/ha-debug logfacility none keepalive 2 warntime 6 deadtime 10 initdead 90 bcast eth0 node servappli01 servappli02 auto_failback yes respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail gid=haclient uid=hacluster haresources file (the same on each server) : servappli01 IPaddr::186.20.100.41/24/eth0 tomcat Result of ifconfig command on the first server (servappli01) : eth0 Link encap:Ethernet HWaddr 00:1E:0B:BB:C2:38 inet adr:186.20.100.40 Bcast:186.20.100.255 Masque:255.255.255.0 adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0 TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:385833 (3.5 GiB) TX bytes:2694953468 (2.5 GiB) Interruption:177 Memoire:fa00-fa012100 eth0:0Link encap:Ethernet HWaddr 00:1E:0B:BB:C2:38 inet adr:186.20.100.41 Bcast:186.20.100.255 Masque:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interruption:177 Memoire:fa00-fa012100 Result of ifconfig command on the second server (servappli02) at the same time : eth0 Link encap:Ethernet HWaddr 00:1E:0B:77:C9:0C inet adr:186.20.100.39 Bcast:186.20.100.255 Masque:255.255.255.0 adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0 TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 lg file transmission:1000 RX bytes:2620027933 (2.4 GiB) TX bytes:3595896739 (3.3 GiB) Interruption:177 Memoire:fa00-fa012100 eth0:0Link encap:Ethernet HWaddr 00:1E:0B:77:C9:0C inet adr:186.20.100.41 Bcast:186.20.100.255 Masque:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 Interruption:177 Memoire:fa00-fa012100 Result of /usr/bin/cl_status listnodes command (on each server) : servappli02 servappli01 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 : active Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 : dead Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 : dead Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 : active And of course, if I kill Tomcat on master server, there's no switch to the second server (a call to a webapp using the VIP doesn't work). Can somebody help me please ? I guess there's is something wrong but I don't know what ! Thanx Mathieu ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems It almost sounds like the nodes are unaware of each other. Could be a network thing maybe. Here's some things to try: Can you ssh or ping one node from the other? Bring up one node with the VIP running - leave the other node up but heartbeat down. an you ping the VIP from the node NOT running HA? What happens when you look at the cluster when both nodes are running - use the crm_mon command and paste what you see in here. I'm thinking you have some sort of network issue. ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http