Re: [Linux-HA] Heartbeat packages for Redhat-7

2015-04-16 Thread Yogendramummaneni Prasad
Thank you Lars for clarifying.
I will see what I can use in our environment.

Thank you again,
Yogi.

On Fri, Apr 3, 2015 at 11:03 PM, Lars Ellenberg lars.ellenb...@linbit.com
wrote:

 On Wed, Apr 01, 2015 at 12:16:38PM +0100, Yogendramummaneni Prasad wrote:
  Hello,
 
  At the moment, we are using heartbeat on RedHat-5.8 using the below
  packages:
  [root@p118278vaps2011 ~]# rpm -qa|grep heartbeat
  heartbeat-2.1.4-11.el5
  heartbeat-stonith-2.1.4-11.el5
  heartbeat-pils-2.1.4-11.el5
 
 
  Now we are planning to upgrade the OS to Redhat-7.0.
  I could not find the same heartbeat packages for RedHat-7.0 on the
 internet.
 
  Could you please confirm if the heartbeat packages are available for
  RedHat-7.0

 Those versions are almost seven years old.

 You can use heartbeat 3.0.6 (if you only use haresources mode).

 If you use crm mode, you need to realize that the crm component
 has been split off into its own project years ago: Pacemaker.

 For CRM mode, if you want to stick with heartbeat, you use
 heartbeat 3.0.6 and Pacemaker 1.1.12 (with LINBIT patches),
 or Pacemaker 1.1.13 (soon to be released, including those patches).

 If you don't have any particular reason to keep using heartbeat,
 the recommended cluster stack is Corosync + Pacemaker,
 which is what you get with the RHEL 7 native HA cluster.

 For more about Pacemaker, visit clusterlabs.org,
 subscribe to us...@clusterlabs.org,
 or join on freenode #clusterlabs


 --
 : Lars Ellenberg
 : http://www.LINBIT.com | Your Way to High Availability
 : DRBD, Linux-HA  and  Pacemaker support and consulting

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
___
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha

Re: [Linux-HA] Heartbeat packages for Redhat-7

2015-04-04 Thread Dimitri Maziuk

On 2015-04-03 17:03, Lars Ellenberg wrote:


You can use heartbeat 3.0.6 (if you only use haresources mode).


You can google for ticket # but basically epel heartbeat maintainer 
replied to my rfa with I don't use heartbeat anymore so no. I meant to 
post that here but forgot.


So there is no heartbeat rpm for el7 in the usual repos. Does 
clusterlabs have one?


Dimitri

___
Linux-HA mailing list is closing down.
Please subscribe to us...@clusterlabs.org instead.
http://clusterlabs.org/mailman/listinfo/users
___
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha


Re: [Linux-HA] Heartbeat packages for Redhat-7

2015-04-03 Thread Lars Ellenberg
On Wed, Apr 01, 2015 at 12:16:38PM +0100, Yogendramummaneni Prasad wrote:
 Hello,
 
 At the moment, we are using heartbeat on RedHat-5.8 using the below
 packages:
 [root@p118278vaps2011 ~]# rpm -qa|grep heartbeat
 heartbeat-2.1.4-11.el5
 heartbeat-stonith-2.1.4-11.el5
 heartbeat-pils-2.1.4-11.el5
 
 
 Now we are planning to upgrade the OS to Redhat-7.0.
 I could not find the same heartbeat packages for RedHat-7.0 on the internet.
 
 Could you please confirm if the heartbeat packages are available for
 RedHat-7.0

Those versions are almost seven years old.

You can use heartbeat 3.0.6 (if you only use haresources mode).

If you use crm mode, you need to realize that the crm component
has been split off into its own project years ago: Pacemaker.

For CRM mode, if you want to stick with heartbeat, you use
heartbeat 3.0.6 and Pacemaker 1.1.12 (with LINBIT patches),
or Pacemaker 1.1.13 (soon to be released, including those patches).

If you don't have any particular reason to keep using heartbeat,
the recommended cluster stack is Corosync + Pacemaker,
which is what you get with the RHEL 7 native HA cluster.

For more about Pacemaker, visit clusterlabs.org,
subscribe to us...@clusterlabs.org,
or join on freenode #clusterlabs


-- 
: Lars Ellenberg
: http://www.LINBIT.com | Your Way to High Availability
: DRBD, Linux-HA  and  Pacemaker support and consulting

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat packages for Redhat-7

2015-04-03 Thread Digimer
On 03/04/15 06:03 PM, Lars Ellenberg wrote:
 On Wed, Apr 01, 2015 at 12:16:38PM +0100, Yogendramummaneni Prasad wrote:
 Hello,

 At the moment, we are using heartbeat on RedHat-5.8 using the below
 packages:
 [root@p118278vaps2011 ~]# rpm -qa|grep heartbeat
 heartbeat-2.1.4-11.el5
 heartbeat-stonith-2.1.4-11.el5
 heartbeat-pils-2.1.4-11.el5


 Now we are planning to upgrade the OS to Redhat-7.0.
 I could not find the same heartbeat packages for RedHat-7.0 on the internet.

 Could you please confirm if the heartbeat packages are available for
 RedHat-7.0
 
 Those versions are almost seven years old.
 
 You can use heartbeat 3.0.6 (if you only use haresources mode).
 
 If you use crm mode, you need to realize that the crm component
 has been split off into its own project years ago: Pacemaker.
 
 For CRM mode, if you want to stick with heartbeat, you use
 heartbeat 3.0.6 and Pacemaker 1.1.12 (with LINBIT patches),
 or Pacemaker 1.1.13 (soon to be released, including those patches).
 
 If you don't have any particular reason to keep using heartbeat,
 the recommended cluster stack is Corosync + Pacemaker,
 which is what you get with the RHEL 7 native HA cluster.
 
 For more about Pacemaker, visit clusterlabs.org,
 subscribe to us...@clusterlabs.org,
 or join on freenode #clusterlabs
 
 

To expand on/provide background to Lars' answer:

https://alteeve.ca/w/History_of_HA_Clustering

Also, please subscribe to Clusterlabs mailing list, as is show in
Lars' footer.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat

2015-01-20 Thread Ron Croonenberg

Hi Dimitri,

yes there are 4 pairs, but they are all active.  When a node fails,  the 
other one in the pair just takes everything over.
A different HA is not an option, it has to be heartbeat.  I noticed 
something called ethmonitor, I can probably notice an IB connection with 
it (it has ipoib on it)




On 01/20/2015 12:51 PM, Dimitri Maziuk wrote:

On 01/20/2015 01:34 PM, Ron Croonenberg wrote:

Hello,

I have an ether net connection that connects all hosts in a cluster and
the nodes also have an IB connection. I want the failover host to take
over when an IB connection goes down on a host. Is there an example for
how to do this? (I am using ipmi for shutting down hosts etc).

A cluster I am using has 8 nodes and want to do fail over in pairs of
two.  in the ha.cf file do I mention all the hosts or just the host and
it's fail over, per pair?


Do you have 4 separate active-passive pairs or a cluster of 8 nodes? If
it's the latter, I think you want pacemaker, not heartbeat. Dunno what
pacemaker might have for monitoring an IB connection, with heartbeat R1
I'd do something like grep for LinkUp in the output of ibstat.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat

2015-01-20 Thread Dimitri Maziuk
On 01/20/2015 01:34 PM, Ron Croonenberg wrote:
 Hello,
 
 I have an ether net connection that connects all hosts in a cluster and
 the nodes also have an IB connection. I want the failover host to take
 over when an IB connection goes down on a host. Is there an example for
 how to do this? (I am using ipmi for shutting down hosts etc).
 
 A cluster I am using has 8 nodes and want to do fail over in pairs of
 two.  in the ha.cf file do I mention all the hosts or just the host and
 it's fail over, per pair?

Do you have 4 separate active-passive pairs or a cluster of 8 nodes? If
it's the latter, I think you want pacemaker, not heartbeat. Dunno what
pacemaker might have for monitoring an IB connection, with heartbeat R1
I'd do something like grep for LinkUp in the output of ibstat.

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Heartbeat in Amazon VMs doest not create virtaul ip address

2014-11-14 Thread David Vossel


- Original Message -
 Hi, I installed on HeartBeat,Centos 6.5 on 2 Amazon EC2 machinesthis is the

If you have an option, I'd strongly recommend using the Pacemaker+CMAN stack
in rhel 6.5. Red Hat began supporting pacemaker in 6.5, so it should be 
available
to you.

-- Vossel

 version:
 [root@ip-10-0-2-68 ha.d]# rpm -qa | grep heartbeat
 heartbeat-libs-3.0.4-2.el6.x86_64
 heartbeat-3.0.4-2.el6.x86_64
 heartbeat-devel-3.0.4-2.el6.x86_64
 
 the floating IP is [root@ip-10-0-2-68 ha.d]# cat haresources
 ip-10-0-2-68 10.0.2.70
 but it is not created on any machine, it does not matter where I do the
 takeover or standby commands
 what am I missing? is this even possible ? these are my setting in ha.cf
 logfacility local0
 ucast eth0 10.0.2.69
 auto_failback on
 node ip-10-0-2-68 ip-10-0-2-69
 ping 10.0.2.1
 use_logd yes
 logfacility local0
 ucast eth0 10.0.2.68
 auto_failback on
 node ip-10-0-2-68 ip-10-0-2-69
 ping 10.0.2.1
 use_logd yes
 
 these is the output of the route command
 [root@ip-10-0-2-68 ha.d]# route -n
 Kernel IP routing table
 Destination Gateway Genmask Flags Metric Ref    Use Iface
 10.0.2.0    0.0.0.0 255.255.255.0   U 0  0    0 eth0
 0.0.0.0 10.0.2.1    0.0.0.0 UG    0  0    0 eth0
 [root@ip-10-0-2-68 ha.d]#
 
 this is how the interfaces eth0 are set up on machine 1:[root@ip-10-0-2-68
 ha.d]# ifconfig
 eth0  Link encap:Ethernet  HWaddr 12:23:49:EF:3A:53
   inet addr:10.0.2.68  Bcast:10.0.2.255  Mask:255.255.255.0
   inet6 addr: fe80::1023:49ff:feef:3a53/64 Scope:Link
   UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
   RX packets:269823 errors:0 dropped:0 overruns:0 frame:0
   TX packets:192305 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:167802149 (160.0 MiB)  TX bytes:48341828 (46.1 MiB)
   Interrupt:247
 
 
 these are the logs showing everything going on fine but when doing ifconfig
 the interface is not there:
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: WARN: node
 ip-10-0-2-69: is dead
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Comm_now_up():
 updating status to active
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Local status
 now set to: 'active'
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: WARN: No STONITH
 device configured.
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: WARN: Shared disks
 are not protected.
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Resources being
 acquired from ip-10-0-2-69.
 Nov 11 21:37:39 ip-10-0-2-68 mach_down(default)[14769]: info:
 /usr/share/heartbeat/mach_down: nice_failback: foreign resources acquired
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: mach_down
 takeover complete.
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Initial
 resource acquisition complete (mach_down)
 Nov 11 21:37:39 ip-10-0-2-68
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[14845]: INFO:
   Resource is stopped
 Nov 11 21:37:39 ip-10-0-2-68 heartbeat[14701]: [14701]: info: Local Resource
 acquisition completed.
 Nov 11 21:37:40 ip-10-0-2-68
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[14958]: INFO:
   Resource is stopped
 Nov 11 21:37:40 ip-10-0-2-68 IPaddr(IPaddr_10.0.2.70)[15057]: INFO:
 /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
 /var/run/resource-agents/send_arp-10.0.2.70 eth0 10.0.2.70 auto not_used
 not_used
 Nov 11 21:37:40 ip-10-0-2-68
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[15064]: INFO:
   Success
 Nov 11 21:37:49 ip-10-0-2-68 heartbeat[14681]: [14681]: info: Local Resource
 acquisition completed. (none)
 Nov 11 21:37:49 ip-10-0-2-68 heartbeat[14681]: [14681]: info: local resource
 transition completed.
 
 
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: WARN: node ip-10-0-2-68: is
 dead
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: info: Comm_now_up():
 updating status to active
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: info: Local status now set
 to: 'active'
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: WARN: No STONITH device
 configured.
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: WARN: Shared disks are not
 protected.
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18350]: info: Resources being
 acquired from ip-10-0-2-68.
 Nov 11 21:38:16 ip-10-0-2-69 heartbeat: [18360]: info: No local resources
 [/usr/share/heartbeat/ResourceManager listkeys ip-10-0-2-69] to acquire.
 Nov 11 21:38:17 ip-10-0-2-69
 /usr/lib/ocf/resource.d//heartbeat/IPaddr(IPaddr_10.0.2.70)[18441]: INFO:
   Resource is stopped
 Nov 11 21:38:17 ip-10-0-2-69 IPaddr(IPaddr_10.0.2.70)[18537]: INFO:
 /usr/libexec/heartbeat/send_arp -i 200 -r 5 -p
 /var/run/resource-agents/send_arp-10.0.2.70 eth0 10.0.2.70 auto not_used
 not_used
 Nov 11 21:38:17 ip-10-0-2-69
 

Re: [Linux-HA] heartbeat 3.0.3 crashes if there are networking/multicast issues (ERROR: lowseq cannnot be greater than ackseq)

2014-06-30 Thread Pasi Kärkkäinen
On Thu, Jun 26, 2014 at 01:30:01PM +0200, Lars Ellenberg wrote:
 On Tue, Jun 24, 2014 at 11:20:48PM +0300, Pasi Kärkkäinen wrote:
  Hello!
  
  I've been seeing heartbeat cluster problems in Linux-based Vyatta and more 
  recent VyOS networking/router appliances.
  These are currently based on Debian Squeeze, and thus are using:
  
  Package: heartbeat
  Version: 1:3.0.3-2
 
 Please use 3.0.5:
 http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/37f57a36a2dd.tar.bz2
 

Do you think v3.0.5 fixes the issue of heartbeat process crashing? 

This patch perhaps? http://hg.linux-ha.org/heartbeat-STABLE_3_0/rev/3e51db646a21


Thanks,

-- Pasi

  VyOS bug report: http://bugzilla.vyos.net/show_bug.cgi?id=244
  
  The problem is that when there are (unexpected) networking problems causing 
  multicast issues,
  which cause problems in the inter-cluster communications, the heartbeat 
  processes will die on the cluster nodes,
  which is bad, right? I assume heartbeat should never die, especially not 
  because of temporary networking issues..
  
  I've also seen heartbeat dying because of temporary network maintenance 
  breaks..
  
  Basicly first I'm seeing this kind of messages:
  
  Jun 23 17:55:02 vyos03 heartbeat: [4119]: WARN: node vyos01: is dead
  Jun 23 17:59:23 vyos03 heartbeat: [4119]: CRIT: Cluster node vyos01 
  returning after partition.
  Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Deadtime value may be too 
  small.
  Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Late heartbeat: Node 
  vyos01: interval 273580 ms
  Jun 23 17:59:23 vyos03 harc[4961]: info: Running /etc/ha.d//rc.d/status 
  status
  Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Releasing resource 
  group: vyos01 IPaddr2-vyatta::10.0.0.10/24/eth1
  Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Running 
  /etc/ha.d/resource.d/IPaddr2-vyatta 10.0.0.10/24/eth1 stop
  Jun 23 17:59:26 vyos03 heartbeat: [4119]: WARN: 1 lost packet(s) for 
  [vyos01] [421:423]
  Jun 23 17:59:39 vyos03 heartbeat: [4119]: WARN: Logging daemon is disabled 
  --enabling logging daemon is recommended
  Jun 23 17:59:40 vyos03 harc[5102]: info: Running /etc/ha.d//rc.d/status 
  status
  
  Which seem normal in the case of networking problem.. But then later:
  
  Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
  filling up (494 messages in queue)
  Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
  filling up (495 messages in queue)
  Jun 23 19:31:23 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
  filling up (496 messages in queue)
  Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
  filling up (497 messages in queue)
  Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
  filling up (498 messages in queue)
  Jun 23 19:31:25 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
  filling up (499 messages in queue)
  Jun 23 19:31:26 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
  filling up (500 messages in queue)
  Jun 23 19:31:42 vyos03 heartbeat: last message repeated 25 times
  
  
  The hist queue size keeps increasing, and when it gets to 500 messages 
  bad things start happening..
  
  
  Jun 23 19:31:43 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
  filling up (500 messages in queue)
  Jun 23 19:31:49 vyos03 heartbeat: last message repeated 9 times
  Jun 23 19:31:49 vyos03 heartbeat: [10921]: ERROR: lowseq cannnot be greater 
  than ackseq
  Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown: Master 
  Control process died.
  Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10921 with 
  SIGTERM
  Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10924 with 
  SIGTERM
  Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10925 with 
  SIGTERM
  Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown(MCP 
  dead): Killing ourselves.
  
  At this point clustering has failed, because the heartbeat 
  services/processes aren't running anymore..
  
  Has anyone else seen this? 
 
 It has been fixed years ago ...
 
  It seems the bug gets triggered at 500 messages in the hist queue,
  and then I always see the ERROR: lowseq cannnot be greater than ackseq 
  and then heartbeat dies..
 
 -- 
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com
 
 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat 3.0.3 crashes if there are networking/multicast issues (ERROR: lowseq cannnot be greater than ackseq)

2014-06-26 Thread Lars Ellenberg
On Tue, Jun 24, 2014 at 11:20:48PM +0300, Pasi Kärkkäinen wrote:
 Hello!
 
 I've been seeing heartbeat cluster problems in Linux-based Vyatta and more 
 recent VyOS networking/router appliances.
 These are currently based on Debian Squeeze, and thus are using:
 
 Package: heartbeat
 Version: 1:3.0.3-2

Please use 3.0.5:
http://hg.linux-ha.org/heartbeat-STABLE_3_0/archive/37f57a36a2dd.tar.bz2

 VyOS bug report: http://bugzilla.vyos.net/show_bug.cgi?id=244
 
 The problem is that when there are (unexpected) networking problems causing 
 multicast issues,
 which cause problems in the inter-cluster communications, the heartbeat 
 processes will die on the cluster nodes,
 which is bad, right? I assume heartbeat should never die, especially not 
 because of temporary networking issues..
 
 I've also seen heartbeat dying because of temporary network maintenance 
 breaks..
 
 Basicly first I'm seeing this kind of messages:
 
 Jun 23 17:55:02 vyos03 heartbeat: [4119]: WARN: node vyos01: is dead
 Jun 23 17:59:23 vyos03 heartbeat: [4119]: CRIT: Cluster node vyos01 returning 
 after partition.
 Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Deadtime value may be too 
 small.
 Jun 23 17:59:23 vyos03 heartbeat: [4119]: WARN: Late heartbeat: Node vyos01: 
 interval 273580 ms
 Jun 23 17:59:23 vyos03 harc[4961]: info: Running /etc/ha.d//rc.d/status status
 Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Releasing resource group: 
 vyos01 IPaddr2-vyatta::10.0.0.10/24/eth1
 Jun 23 17:59:25 vyos03 ResourceManager[4991]: info: Running 
 /etc/ha.d/resource.d/IPaddr2-vyatta 10.0.0.10/24/eth1 stop
 Jun 23 17:59:26 vyos03 heartbeat: [4119]: WARN: 1 lost packet(s) for [vyos01] 
 [421:423]
 Jun 23 17:59:39 vyos03 heartbeat: [4119]: WARN: Logging daemon is disabled 
 --enabling logging daemon is recommended
 Jun 23 17:59:40 vyos03 harc[5102]: info: Running /etc/ha.d//rc.d/status status
 
 Which seem normal in the case of networking problem.. But then later:
 
 Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
 filling up (494 messages in queue)
 Jun 23 19:31:22 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
 filling up (495 messages in queue)
 Jun 23 19:31:23 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
 filling up (496 messages in queue)
 Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
 filling up (497 messages in queue)
 Jun 23 19:31:24 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
 filling up (498 messages in queue)
 Jun 23 19:31:25 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
 filling up (499 messages in queue)
 Jun 23 19:31:26 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
 filling up (500 messages in queue)
 Jun 23 19:31:42 vyos03 heartbeat: last message repeated 25 times
 
 
 The hist queue size keeps increasing, and when it gets to 500 messages bad 
 things start happening..
 
 
 Jun 23 19:31:43 vyos03 heartbeat: [10921]: ERROR: Message hist queue is 
 filling up (500 messages in queue)
 Jun 23 19:31:49 vyos03 heartbeat: last message repeated 9 times
 Jun 23 19:31:49 vyos03 heartbeat: [10921]: ERROR: lowseq cannnot be greater 
 than ackseq
 Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown: Master 
 Control process died.
 Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10921 with 
 SIGTERM
 Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10924 with 
 SIGTERM
 Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Killing pid 10925 with 
 SIGTERM
 Jun 23 19:31:50 vyos03 heartbeat: [10923]: CRIT: Emergency Shutdown(MCP 
 dead): Killing ourselves.
 
 At this point clustering has failed, because the heartbeat services/processes 
 aren't running anymore..
 
 Has anyone else seen this? 

It has been fixed years ago ...

 It seems the bug gets triggered at 500 messages in the hist queue,
 and then I always see the ERROR: lowseq cannnot be greater than ackseq and 
 then heartbeat dies..

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Supported Version

2014-06-02 Thread Digimer
You should email Linbit (http://linbit.com) as they're the company that 
still supports the heartbeat package.


This said, if you are starting a new project, I strongly urge you to 
consider corosync + pacemaker. The heartbeat project has not been 
actively developed in quite some time, and there are no plans to restart 
development in the future.


Back in the day, heartbeat was one separate platform and Red Hat's RHCS 
was another. Over the years, this caused confusion and a lot of 
reinventing the wheel, so the two communities started work on merging 
into one common platform. The result is corosync + pacemaker, which is 
what all major developers are supporting from here on in.


If you're curious about the more detailed story, I've got a (still in 
progress) history here:


https://alteeve.ca/w/History_of_HA_Clustering

Again, it's not complete, but it does give a fairly good background on 
why heartbeat is not recommended anymore. It *is* still supported by 
Linbit though, so I'm not saying not to use it. Just consider the future. :)


digimer

On 02/06/14 11:07 AM, Venkata G Thota wrote:

Hello,

In our project we had the heartbeat cluster with version
heartbeat-2.1.4-0.24.9.

Is it the supported version ?

Kindly assist how to get support for heartbeat cluster issues.
Regards



Venkata G Thota
  DLF IT PARK

GTS Services Delivery - India
  Chennai
UNIX Administrator
  India
Phone:
+91 44 434 25397


Mobile:
+91 99625 48884


e-mail:
venkt...@in.ibm.com


Red Hat Certified Engineer


In happy moments, praise God. In difficult moments, seek God. In quiet
moments, worship God. In painful moments, trust God.
In every moment, thank God.





___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems




--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Supported Version

2014-06-02 Thread Lars Marowsky-Bree
On 2014-06-02T20:37:59, Venkata G Thota venkt...@in.ibm.com wrote:

 Hello,
 
 In our project we had the heartbeat cluster with version 
 heartbeat-2.1.4-0.24.9.
 
 Is it the supported version ?
 
 Kindly assist how to get support for heartbeat cluster issues.
 Regards

That looks like a fairly old heartbeat version from SUSE Linux
Enterprise Server 10 SP4.

SLES 10 is out of general support since July 2013, but extended support
(https://www.suse.com/support/lc-faq.html#2) or LTSS is still
available.

Alternatively, the best option you'd have is to upgrade to SLES + HA 11
SP3.



Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Supported Version

2014-06-02 Thread Lars Marowsky-Bree
On 2014-06-02T12:04:23, Digimer li...@alteeve.ca wrote:

 You should email Linbit (http://linbit.com) as they're the company that
 still supports the heartbeat package.

For completeness, I doubt Linbit will support this version, since 2.1.4
from SLES 10 contains a number of backports from the pacemaker 0.7/1.0
series. While source code is obviously available, I'd not suggest to
inflict this on Linbit ;-)


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Supported Version

2014-06-02 Thread Digimer

On 02/06/14 06:30 PM, Lars Marowsky-Bree wrote:

On 2014-06-02T12:04:23, Digimer li...@alteeve.ca wrote:


You should email Linbit (http://linbit.com) as they're the company that
still supports the heartbeat package.


For completeness, I doubt Linbit will support this version, since 2.1.4
from SLES 10 contains a number of backports from the pacemaker 0.7/1.0
series. While source code is obviously available, I'd not suggest to
inflict this on Linbit ;-)


Regards,
 Lars


I didn't think they would support it, but I wanted to leave that for 
Linbit to say (I've been wrong enough times before...)


--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Supported Version

2014-06-02 Thread Lars Ellenberg
On Mon, Jun 02, 2014 at 06:31:09PM -0400, Digimer wrote:
 On 02/06/14 06:30 PM, Lars Marowsky-Bree wrote:
 On 2014-06-02T12:04:23, Digimer li...@alteeve.ca wrote:
 
 You should email Linbit (http://linbit.com) as they're the company that
 still supports the heartbeat package.
 
 For completeness, I doubt Linbit will support this version, since 2.1.4
 from SLES 10 contains a number of backports from the pacemaker 0.7/1.0
 series. While source code is obviously available, I'd not suggest to
 inflict this on Linbit ;-)
 
 
 Regards,
  Lars
 
 I didn't think they would support it, but I wanted to leave that for
 Linbit to say (I've been wrong enough times before...)

Thanks, both of you ;-)

Yes, we maintain heartbeat.
We still occasionally bugfix (or even enhance) it if necessary.

That does not mean we support each and every legacy version of it
that happend to be bundled with some distribution at some point.

That's probably a matter of time and money,
and political constraints ...

If you really have to stay with whatever platform you have right now,
but need it to be supported for a very long term:
as that is a SuSE platform, ask SuSE what they can offer.

If you are basically happy with what you have,
but need just this one snag fixed,
describe your problem, and maybe someone will be able to
tell you what to do (but that won't go without metioning
in every second sentence that you should probably upgrade).

If you are about to set up a new cluster,
go with current software.

Heartbeat itself is currently 3.something,
in fact pending a new release tag since ages...

Appart from few but important bugfixes
and some minor improvements to the inner workings of it,
the main difference from heartbeat 2. to heartbeat 3 is,
that the crm (cluster resource manager) part of it
was split out (years ago) and became Pacemaker.

Depending on what you have now, what you are used to do,
what you feel most comfortable with, and what you want to achieve,
I see several options.

 * you are used to haresources, and in fact want to still use that
   -= use current heartbeat 3.x packages,
   and keep doing whatever you did until now
 * you have been using crm with heartbeat 2.x
   or at least you now want to start using it
   -= you should upgrade to Pacemaker, which is just
   the natural evolution of the heartbeat crm component,
   even with the same lead developer still.
   *several years* of evolution and improvements, in fact.

   You have further options now:

   * Keep the cluster communication and membership layer:
 heartbeat (3.x) + pacemaker
   * Change the cluster communication and membership layer:
 corosync (2.x) + pacemaker

   (and more, like cman and corosync 1.x...) 

Recommendation for new clusters:
go with pacemaker (1.1.12 will be release soon)
and corosync (2.3.3 is it now?).

That's also about what you will get with current distributions
(rhel7, sles12).

(Though we at Linbit are still happy
 with heartbeat + pacemaker as well).

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Supported Version

2014-06-02 Thread Digimer

On 02/06/14 07:05 PM, Lars Ellenberg wrote:

(Though we at Linbit are still happy
  with heartbeat + pacemaker as well).


Heathens! HEATHENS!!

;)

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat failover

2014-01-27 Thread Bjoern.Becker
Hello Arnold,

yes, I recently found out that the sync-rate was to high for our old firewall.
That are two datacenters, and all traffic is routed through this firewall. I 
don't know exactly why, this is the concept somehow. 

Do you know how to force another ip address on the other side? In heartbeat I 
was able to say, that the clusterip is another one as on the other node. 
In corosync/pacemaker I can't find such an example. 

Best regards
Björn 


-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Arnold Krille
Gesendet: Samstag, 25. Januar 2014 01:46
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

On Thu, 23 Jan 2014 16:45:04 + bjoern.bec...@easycash.de wrote:
 Uhhh..I got the same configuration as the example config you sent me 
 now. But I cause high cpu load on our cisco asa firewall..
 
 I guess this traffic is not normal?
snip

When you want your cluster to repair failures _fast_, the components have to 
sync their state _fast_. So they have to talk a lot, not in terms of megabytes 
but in terms of small packages with low latency in submission.

So, yes that traffic is normal. Why is there a firewall between your nodes on 
the network where the cluster traffic happens?

Have fun,

Arnold
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat failover

2014-01-24 Thread Bjoern.Becker
Hello,

I running corosync with success now. 

But I got a problem, because I got two different subnet and I don't know which 
ClusterIP I have to use. 
I got 10.128.61.0 and 10.128.62.0, so a ClusterIP like 10.128.61.61 will not 
routed in 10.128.62.0. 

How I can use different ClusterIP's per side? 

Best regards
Björn 

-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Becker, Björn
Gesendet: Donnerstag, 23. Januar 2014 17:45
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

Uhhh..I got the same configuration as the example config you sent me now. 
But I cause high cpu load on our cisco asa firewall..

I guess this traffic is not normal?

root@node01:/etc/corosync# tcpdump dst port 5405
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode 
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 
17:41:06.093140 IP node01.5405  node02.5405: UDP, length 70
17:41:06.097327 IP node02.5405  node01.5405: UDP, length 70
17:41:06.113418 IP node01.52580  node02.5405: UDP, length 82
17:41:06.286517 IP node01.5405  node02.5405: UDP, length 70
17:41:06.291095 IP node02.5405  node01.5405: UDP, length 70
17:41:06.480221 IP node01.5405  node02.5405: UDP, length 70 17:41:06.484520 IP 
node02.5405  node01.5405: UDP, length 70
17:41:06.500608 IP node01.52580  node02.5405: UDP, length 82
17:41:06.673721 IP node01.5405  node02.5405: UDP, length 70
17:41:06.678654 IP node02.5405  node01.5405: UDP, length 70
17:41:06.867757 IP node01.5405  node02.5405: UDP, length 70
17:41:06.872492 IP node02.5405  node01.5405: UDP, length 70
17:41:06.888576 IP node01.52580  node02.5405: UDP, length 82
17:41:07.061664 IP node01.5405  node02.5405: UDP, length 70
17:41:07.066304 IP node02.5405  node01.5405: UDP, length 70
17:41:07.255409 IP node01.5405  node02.5405: UDP, length 70
17:41:07.260512 IP node02.5405  node01.5405: UDP, length 70
17:41:07.275601 IP node01.52580  node02.5405: UDP, length 82

Mit freundlichen Grüßen / Best regards
Björn 


-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Becker, Björn
Gesendet: Donnerstag, 23. Januar 2014 17:28
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

Hi Lukas,

thank you. Well, I've to wait for some firewall changes for 5405 UDP. 

But I'm not sure if it's correct what I'm doing.

Node1:
interface {
member {
memberaddr: 10.128.61.60 # node 1 
}
member {
memberaddr: 10.128.62.60 # node 2
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.61.0
mcastport: 5405
}
transport: udpu

Node2: 
interface {
member {
memberaddr: 10.128.61.60
}
member {
memberaddr: 10.128.62.60
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.62.0
mcastport: 5405
}
transport: udpu

Something seems to be wrong defenitly. My firewall was on very high load...


Mit freundlichen Grüßen / Best regards
Björn 


-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar
Gesendet: Donnerstag, 23. Januar 2014 16:54
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

Hi Björn

Here ist an example how you can setup corosync to use Unicast UDP:
https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu

The important parts are transport: udpu and that you need to configure every 
member manually using memberaddr: 10.16.35.115.

Best regards
Lukas


On Thu, 23 Jan 2014 13:36:22 +
bjoern.bec...@easycash.de wrote:

 Hello,
 
 thanks a lot! I didn't know about heartbeat is almost deprecated.
 I'll try corosync and pacemaker, but I read that corosync need to run 
 over multicast. Unfortunately, I can't use multicast in my network.
 Do you know any other possibility, I can't find anything that corosync 
 can run without multicast?
 
 
 Best regards
 Björn
 
 -Ursprüngliche Nachricht-
 
 Von: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
 Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing 
 list Betreff: Re: [Linux-HA] heartbeat failover
 
 On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
  Hello,
 
  I got a drbd+nfs+heartbeat setup and in general it's working. But it 
  takes to long to failover and I try to tune this.
 
  When node 1 is active and I shutdown

Re: [Linux-HA] heartbeat failover

2014-01-23 Thread Bjoern.Becker
Hello,

thanks a lot! I didn't know about heartbeat is almost deprecated.
I'll try corosync and pacemaker, but I read that corosync need to run over 
multicast.
Unfortunately, I can't use multicast in my network. Do you know any other 
possibility, I can't find anything that corosync can run without multicast?


Best regards
Björn 

-Ursprüngliche Nachricht-

Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
Gesendet: Mittwoch, 22. Januar 2014 20:36
An: General Linux-HA mailing list
Betreff: Re: [Linux-HA] heartbeat failover

On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
 Hello,

 I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to 
 long to failover and I try to tune this.

 When node 1 is active and I shutdown node 2, then node 1 try to activate the 
 cluster.
 The problem is, node 1 already got the primary role and when re-activating it 
 take time again and during this the nfs share isn't available.

 Is it possible to disable this? Node 1 don't have to do anything if it's 
 already in primary role and the second node is not available.

 Mit freundlichen Grüßen / Best regards Björn

If this is a new project, I strongly recommend switching out heartbeat for 
corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long 
time and there are no plans to restart development in the future. Everything 
(even RH) is standardizing on the corosync+pacemaker stack, so it has the most 
vibrant community as well.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat failover

2014-01-23 Thread Lukas Grossar
Hi Björn

Here ist an example how you can setup corosync to use Unicast UDP:
https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu

The important parts are transport: udpu and that you need to
configure every member manually using memberaddr: 10.16.35.115.

Best regards
Lukas


On Thu, 23 Jan 2014 13:36:22 +
bjoern.bec...@easycash.de wrote:

 Hello,
 
 thanks a lot! I didn't know about heartbeat is almost deprecated.
 I'll try corosync and pacemaker, but I read that corosync need to run
 over multicast. Unfortunately, I can't use multicast in my network.
 Do you know any other possibility, I can't find anything that
 corosync can run without multicast?
 
 
 Best regards
 Björn 
 
 -Ursprüngliche Nachricht-
 
 Von: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
 Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA
 mailing list Betreff: Re: [Linux-HA] heartbeat failover
 
 On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
  Hello,
 
  I got a drbd+nfs+heartbeat setup and in general it's working. But
  it takes to long to failover and I try to tune this.
 
  When node 1 is active and I shutdown node 2, then node 1 try to
  activate the cluster. The problem is, node 1 already got the
  primary role and when re-activating it take time again and during
  this the nfs share isn't available.
 
  Is it possible to disable this? Node 1 don't have to do anything if
  it's already in primary role and the second node is not available.
 
  Mit freundlichen Grüßen / Best regards Björn
 
 If this is a new project, I strongly recommend switching out
 heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't
 been developed in a long time and there are no plans to restart
 development in the future. Everything (even RH) is standardizing on
 the corosync+pacemaker stack, so it has the most vibrant community as
 well.
 



-- 
Adfinis SyGroup AG
Lukas Grossar, System Engineer

Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06


signature.asc
Description: PGP signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat failover

2014-01-23 Thread Bjoern.Becker
Hi Lukas,

thank you. Well, I've to wait for some firewall changes for 5405 UDP. 

But I'm not sure if it's correct what I'm doing.

Node1:
interface {
member {
memberaddr: 10.128.61.60 # node 1 
}
member {
memberaddr: 10.128.62.60 # node 2
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.61.0
mcastport: 5405
}
transport: udpu

Node2: 
interface {
member {
memberaddr: 10.128.61.60
}
member {
memberaddr: 10.128.62.60
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.62.0
mcastport: 5405
}
transport: udpu

Something seems to be wrong defenitly. My firewall was on very high load...


Mit freundlichen Grüßen / Best regards
Björn 


-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar
Gesendet: Donnerstag, 23. Januar 2014 16:54
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

Hi Björn

Here ist an example how you can setup corosync to use Unicast UDP:
https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu

The important parts are transport: udpu and that you need to configure every 
member manually using memberaddr: 10.16.35.115.

Best regards
Lukas


On Thu, 23 Jan 2014 13:36:22 +
bjoern.bec...@easycash.de wrote:

 Hello,
 
 thanks a lot! I didn't know about heartbeat is almost deprecated.
 I'll try corosync and pacemaker, but I read that corosync need to run 
 over multicast. Unfortunately, I can't use multicast in my network.
 Do you know any other possibility, I can't find anything that corosync 
 can run without multicast?
 
 
 Best regards
 Björn
 
 -Ursprüngliche Nachricht-
 
 Von: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
 Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing 
 list Betreff: Re: [Linux-HA] heartbeat failover
 
 On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
  Hello,
 
  I got a drbd+nfs+heartbeat setup and in general it's working. But it 
  takes to long to failover and I try to tune this.
 
  When node 1 is active and I shutdown node 2, then node 1 try to 
  activate the cluster. The problem is, node 1 already got the primary 
  role and when re-activating it take time again and during this the 
  nfs share isn't available.
 
  Is it possible to disable this? Node 1 don't have to do anything if 
  it's already in primary role and the second node is not available.
 
  Mit freundlichen Grüßen / Best regards Björn
 
 If this is a new project, I strongly recommend switching out heartbeat 
 for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed 
 in a long time and there are no plans to restart development in the 
 future. Everything (even RH) is standardizing on the 
 corosync+pacemaker stack, so it has the most vibrant community as 
 well.
 



--
Adfinis SyGroup AG
Lukas Grossar, System Engineer

Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat failover

2014-01-23 Thread Bjoern.Becker
Uhhh..I got the same configuration as the example config you sent me now. 
But I cause high cpu load on our cisco asa firewall..

I guess this traffic is not normal?

root@node01:/etc/corosync# tcpdump dst port 5405
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:41:06.093140 IP node01.5405  node02.5405: UDP, length 70
17:41:06.097327 IP node02.5405  node01.5405: UDP, length 70
17:41:06.113418 IP node01.52580  node02.5405: UDP, length 82
17:41:06.286517 IP node01.5405  node02.5405: UDP, length 70
17:41:06.291095 IP node02.5405  node01.5405: UDP, length 70
17:41:06.480221 IP node01.5405  node02.5405: UDP, length 70
17:41:06.484520 IP node02.5405  node01.5405: UDP, length 70
17:41:06.500608 IP node01.52580  node02.5405: UDP, length 82
17:41:06.673721 IP node01.5405  node02.5405: UDP, length 70
17:41:06.678654 IP node02.5405  node01.5405: UDP, length 70
17:41:06.867757 IP node01.5405  node02.5405: UDP, length 70
17:41:06.872492 IP node02.5405  node01.5405: UDP, length 70
17:41:06.888576 IP node01.52580  node02.5405: UDP, length 82
17:41:07.061664 IP node01.5405  node02.5405: UDP, length 70
17:41:07.066304 IP node02.5405  node01.5405: UDP, length 70
17:41:07.255409 IP node01.5405  node02.5405: UDP, length 70
17:41:07.260512 IP node02.5405  node01.5405: UDP, length 70
17:41:07.275601 IP node01.52580  node02.5405: UDP, length 82

Mit freundlichen Grüßen / Best regards
Björn 


-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Becker, Björn
Gesendet: Donnerstag, 23. Januar 2014 17:28
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

Hi Lukas,

thank you. Well, I've to wait for some firewall changes for 5405 UDP. 

But I'm not sure if it's correct what I'm doing.

Node1:
interface {
member {
memberaddr: 10.128.61.60 # node 1 
}
member {
memberaddr: 10.128.62.60 # node 2
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.61.0
mcastport: 5405
}
transport: udpu

Node2: 
interface {
member {
memberaddr: 10.128.61.60
}
member {
memberaddr: 10.128.62.60
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.62.0
mcastport: 5405
}
transport: udpu

Something seems to be wrong defenitly. My firewall was on very high load...


Mit freundlichen Grüßen / Best regards
Björn 


-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar
Gesendet: Donnerstag, 23. Januar 2014 16:54
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

Hi Björn

Here ist an example how you can setup corosync to use Unicast UDP:
https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu

The important parts are transport: udpu and that you need to configure every 
member manually using memberaddr: 10.16.35.115.

Best regards
Lukas


On Thu, 23 Jan 2014 13:36:22 +
bjoern.bec...@easycash.de wrote:

 Hello,
 
 thanks a lot! I didn't know about heartbeat is almost deprecated.
 I'll try corosync and pacemaker, but I read that corosync need to run 
 over multicast. Unfortunately, I can't use multicast in my network.
 Do you know any other possibility, I can't find anything that corosync 
 can run without multicast?
 
 
 Best regards
 Björn
 
 -Ursprüngliche Nachricht-
 
 Von: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
 Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing 
 list Betreff: Re: [Linux-HA] heartbeat failover
 
 On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
  Hello,
 
  I got a drbd+nfs+heartbeat setup and in general it's working. But it 
  takes to long to failover and I try to tune this.
 
  When node 1 is active and I shutdown node 2, then node 1 try to 
  activate the cluster. The problem is, node 1 already got the primary 
  role and when re-activating it take time again and during this the 
  nfs share isn't available.
 
  Is it possible to disable this? Node 1 don't have to do anything if 
  it's already in primary role and the second node is not available.
 
  Mit freundlichen Grüßen / Best regards Björn
 
 If this is a new project, I strongly recommend switching out heartbeat 
 for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed 
 in a long time and there are no plans to restart

Re: [Linux-HA] heartbeat failover

2014-01-22 Thread Digimer

On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:

Hello,

I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to 
long to failover and I try to tune this.

When node 1 is active and I shutdown node 2, then node 1 try to activate the 
cluster.
The problem is, node 1 already got the primary role and when re-activating it 
take time again and during this the nfs share isn't available.

Is it possible to disable this? Node 1 don't have to do anything if it's 
already in primary role and the second node is not available.

Mit freundlichen Grüßen / Best regards
Björn


If this is a new project, I strongly recommend switching out heartbeat 
for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed 
in a long time and there are no plans to restart development in the 
future. Everything (even RH) is standardizing on the corosync+pacemaker 
stack, so it has the most vibrant community as well.


--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat errors related to Gmain_timeout_dispatch at low traffic

2013-11-20 Thread Savita Kulkarni
Hi Lars,

We observed one pattern with these errors - at most of the case,on both VMs
, these errors came at the same time.
We are suspecting either network issue in that case only late heartbeat
error will come not Gmain_timeout_dispatch related errors right ? or
VM is getting paused for sometime for some reason and when it is resumed
Gmain_timeout_dispatch/late heartbeat errors are coming.
We are investigating more on this.


@heartbeat 3 - for this issue most of the time advice given was to upgrade.
But we are using same heartbeat version
in other setups also and it is working fine there.

What do you think?

Regards,
Savita


On Tue, Nov 19, 2013 at 4:23 PM, Lars Ellenberg
lars.ellenb...@linbit.comwrote:

 On Thu, Nov 14, 2013 at 04:46:16PM +0530, Savita Kulkarni wrote:
  Hi,
 
  Recently we are seeing lots of heartbeat errors related to
  Gmain_timeout_dispatch
  on our system.
  I checked on mailing list archives if other people have faced this issue.
  There are few email threads regarding this but people are seeing this
 issue
  in case of high load.
 
  On our system there is very low/no load is present.
 
  We are running heartbeat on guest VMs, using VMWARE ESXi 5.0.
  We have heartbeat -2.1.3-4
  It is working fine without any issues on other other setups and issue is
  coming only on this setup.
 
  Following types of errors are present in /var/log/messages
 
  Nov 12 09:58:43  heartbeat: [23036]: WARN: Gmain_timeout_dispatch:
  Dispatch function for send local status was delayed 15270 ms ( 1010
  ms) before being called (GSource: 0x138926b8)
  Nov 12 09:59:00  heartbeat: [23036]: info: Gmain_timeout_dispatch:
  started at 583294569 should have started at 583293042
  Nov 12 09:59:00 heartbeat: [23036]: WARN: Gmain_timeout_dispatch:
  Dispatch function for update msgfree count was delayed 33960 ms (
  1 ms) before being called (GSource: 0x13892f58)
 
  Can anyone tell me what can be the issue?
 
  Can it be a hardware issue?

 Could be many things, even that, yes.

 Could be that upgrading to recent heartbeat 3 helps.

 Could be that there is to little load, and your virtualization just
 stops scheduling the VM itself, because it thinks it is underutilized...

 Does it recover if you kill/restart heartbeat?

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat errors related to Gmain_timeout_dispatch at low traffic

2013-11-19 Thread Lars Ellenberg
On Thu, Nov 14, 2013 at 04:46:16PM +0530, Savita Kulkarni wrote:
 Hi,
 
 Recently we are seeing lots of heartbeat errors related to
 Gmain_timeout_dispatch
 on our system.
 I checked on mailing list archives if other people have faced this issue.
 There are few email threads regarding this but people are seeing this issue
 in case of high load.
 
 On our system there is very low/no load is present.
 
 We are running heartbeat on guest VMs, using VMWARE ESXi 5.0.
 We have heartbeat -2.1.3-4
 It is working fine without any issues on other other setups and issue is
 coming only on this setup.
 
 Following types of errors are present in /var/log/messages
 
 Nov 12 09:58:43  heartbeat: [23036]: WARN: Gmain_timeout_dispatch:
 Dispatch function for send local status was delayed 15270 ms ( 1010
 ms) before being called (GSource: 0x138926b8)
 Nov 12 09:59:00  heartbeat: [23036]: info: Gmain_timeout_dispatch:
 started at 583294569 should have started at 583293042
 Nov 12 09:59:00 heartbeat: [23036]: WARN: Gmain_timeout_dispatch:
 Dispatch function for update msgfree count was delayed 33960 ms (
 1 ms) before being called (GSource: 0x13892f58)
 
 Can anyone tell me what can be the issue?
 
 Can it be a hardware issue?

Could be many things, even that, yes.

Could be that upgrading to recent heartbeat 3 helps.

Could be that there is to little load, and your virtualization just
stops scheduling the VM itself, because it thinks it is underutilized...

Does it recover if you kill/restart heartbeat?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat v1 and stonith/stonith_host ipmilan

2013-07-17 Thread Martin Langhoff
On Wed, Jul 17, 2013 at 6:03 PM, Martin Langhoff
martin.langh...@gmail.com wrote:
 But the 'stonith' script/binary and the scripts that the old
 documentation indicates aren't there anymore (when I install on
 RHEL6.4).

Configuring stonith_host external foo bar baz led me in the right
direction. heartbeat knows what to do, but on RHEL/CentOS/SL 6.x
cluster-glue no longer includes stonith agents.

Some info at http://www.gossamer-threads.com/lists/linuxha/pacemaker/74487

So I rebuilt the RPMs for cluster-glue reversing that removal.


It is a dicey proposition, of course, to setup a cluster that I expect
to be long-lived based on software that folks are running to
deprecate. But I have played with corosync + pacemaker extensively,
and TBH they are way overkill for a simple setup.

Is there a _simple_ setup guide for a two node cluster? Y'know, LVM,
couple mountpoints, one server daemon (mysql)?

I am not afraid of complexity; but I like to pick where to invest in
complexity :-)

cheers,



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat v1 and stonith/stonith_host ipmilan

2013-07-17 Thread Digimer

On 17/07/13 20:43, Martin Langhoff wrote:

On Wed, Jul 17, 2013 at 6:03 PM, Martin Langhoff
martin.langh...@gmail.com wrote:

But the 'stonith' script/binary and the scripts that the old
documentation indicates aren't there anymore (when I install on
RHEL6.4).


Configuring stonith_host external foo bar baz led me in the right
direction. heartbeat knows what to do, but on RHEL/CentOS/SL 6.x
cluster-glue no longer includes stonith agents.

Some info at http://www.gossamer-threads.com/lists/linuxha/pacemaker/74487

So I rebuilt the RPMs for cluster-glue reversing that removal.


It is a dicey proposition, of course, to setup a cluster that I expect
to be long-lived based on software that folks are running to
deprecate. But I have played with corosync + pacemaker extensively,
and TBH they are way overkill for a simple setup.

Is there a _simple_ setup guide for a two node cluster? Y'know, LVM,
couple mountpoints, one server daemon (mysql)?

I am not afraid of complexity; but I like to pick where to invest in
complexity :-)

cheers,


The easiest, native way under RHEL/CentOS is to use corosync + cman + 
rgmanager. The configuration you are describing will be simple and will 
be properly supported until 2020 (at least), and not need hacks.


If you're interested in this approach, I can help. Here or on 
#linux-cluster on freenode's IRC.


digimer

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat v1 and stonith/stonith_host ipmilan

2013-07-17 Thread Martin Langhoff
On Wed, Jul 17, 2013 at 9:34 PM, Digimer li...@alteeve.ca wrote:
 The easiest, native way under RHEL/CentOS is to use corosync + cman +
 rgmanager. The configuration you are describing will be simple and will be
 properly supported until 2020 (at least), and not need hacks.

 If you're interested in this approach, I can help. Here or on #linux-cluster
 on freenode's IRC.

Thanks for the offer to help. Is there any clear setup guide you can
point me to?

My TZ is EDT, so midnight (bedtime!) now. I won't be awake and on
email/irc until tomorrow morning.



m
--
 martin.langh...@gmail.com
 -  ask interesting questions
 - don't get distracted with shiny stuff  - working code first
 ~ http://docs.moodle.org/en/User:Martin_Langhoff
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat v1 and stonith/stonith_host ipmilan

2013-07-17 Thread Digimer

On 18/07/13 00:12, Martin Langhoff wrote:

On Wed, Jul 17, 2013 at 9:34 PM, Digimer li...@alteeve.ca wrote:

The easiest, native way under RHEL/CentOS is to use corosync + cman +
rgmanager. The configuration you are describing will be simple and will be
properly supported until 2020 (at least), and not need hacks.

If you're interested in this approach, I can help. Here or on #linux-cluster
on freenode's IRC.


Thanks for the offer to help. Is there any clear setup guide you can
point me to?

My TZ is EDT, so midnight (bedtime!) now. I won't be awake and on
email/irc until tomorrow morning.


Heh, same timezone, but I'm more of a night owl. :)

I have a tutorial that was written for people who want to host 
highly-available VMs on a two-node red hat cluster. It goes into a lot 
of detail that you may not be interested in, but I think it's pretty 
comprehensive (I tried to assume no prior knowledge of HA). So perhaps 
you can tease out the parts you're interested in.


https://alteeve.ca/w/2-Node_Red_Hat_KVM_Cluster_Tutorial

You're configuration would need basically;

* Node definitions with fence methods defined
* Resource section covering your storage and daemon
* failover domain to control which node is primary for a given service 
and which is the backup


The tutorial covers clustered LVM and uses the GFS2 clustered file 
system. So it anticipates a somewhat complex setup. If you are looking 
for simple failover, you can skip all of that. You could even dump LVM 
all together, if your goal is to simply support MySQL's data storage.


So the config, in this case, you be;

* The cluster name is foo
* This is a two node cluster (disable quorum)
** Node 1 is this, and here is how you fence it
** Node 2 is this, and here is how you fence it
* Resource;
** I have a file system resource call X mounted at Y
** I have a script resource that controls daemon Z
* Failover Domain
** I have an ordered domain that says run on node 1 when possible, node 
2 otherwise. if you fail over to node 2, stay there when node 1 returns

* Service
** Create an ordered service that follows the rules set in failover 
damain. This service requires the FS to mount before the daemon service 
starts. Stop in the reverse order


That's it. It might seem a little overwhelming at first, but it really 
is pretty simple. You already understand the concept of fencing, which 
trips up most people, so you're more than half-way there. So long as 
your switch handles multicast, your golden. If not, no big deal, just 
add the configuration option that forces unicast mode.


hope this helps

--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat haresources with IPv6

2013-06-17 Thread Digimer

Ho Thiago,

  Heartbeat is deprecated and has not been developed in some time. 
There are no plans to restart development, either. It is _strongly_ 
advised that new setups use corosync + pacemaker. You can use the IPv6 
resource agents with it, too.


  The best place to look is on clusterlabs.org's Cluster from Scratch 
tutorial. It covers as the first example setting up an (IPv4) virtual IP 
address. It should be easy to adapt that to your IPv6 implementation. 
You will see two versions; One for crmsh and one for pcs. I would 
recommend the crmsh version for Ubuntu.


Cheers

On 06/17/2013 11:35 AM, lis...@adminlinux.com.br wrote:

Hi,


I'm using Ubuntu 12.04 + Heartbeat 3.0.5-3ubuntu2 to provide high
availability for some IP addresses.
I want to configure an IPv6 address on my haresources. I did this:

File /etc/heartbeat/haresources:

server.domain.com \
nbsp;nbsp;nbsp; 192.168.2.62/32/eth1 \
nbsp;nbsp;nbsp; 192.168.2.64/32/eth1 \
nbsp;nbsp;nbsp; 192.168.2.72/32/eth1 \
nbsp;nbsp;nbsp; IPv6addr::2001:db8:38a5:8::2006/48/eth1 \
nbsp;nbsp;nbsp; MailTo::a...@domain.com

The IPv4 addresses work fine, but I'm not getting success with the IPv6
address.
My logs shows this message:
ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr
2001:db8:38a5:8 2006/48/eth1 start
ResourceManager[22129]: CRIT: Giving up resources due to failure of
IPv6addr::2001:db8:38a5:8::2006/48/eth1
ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr
2001:db8:38a5:8 2006/48/eth1 stop
ResourceManager[22129]: info: Retrying failed stop operation
[IPv6addr::2001:db8:38a5:8::2006/48/eth1]

Apparently there is a conflict between the characters '::' inside the
IPv6 address and the separator '::' used in the haresources. But I would
not like have to expand the IPv6 address.
Does anyone know a way to avoid this conflict?

Thanks!
--
Thiago Henrique
www.adminlinux.com.br







___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems



--
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat haresources with IPv6

2013-06-15 Thread Lars Ellenberg
On Fri, Jun 14, 2013 at 03:29:49PM -0300, lis...@adminlinux.com.br wrote:
 Hi,
 
 I'm using Ubuntu 12.04 + Heartbeat 3.0.5-3ubuntu2 to provide high 
 availability for some IP addresses.
 I want to configure an IPv6 address on my haresources. I did this:
 
 File /etc/heartbeat/haresources:
 
 server.domain.com \
192.168.2.62/32/eth1 \
192.168.2.64/32/eth1 \
192.168.2.72/32/eth1 \
IPv6addr::2001:db8:38a5:8::2006/48/eth1 \
MailTo::a...@domain.com
 
 The IPv4 addresses work fine, but I'm not getting success with the IPv6 
 address.
 My logs shows this message:
 ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 
 2001:db8:38a5:8 2006/48/eth1 start
 ResourceManager[22129]: CRIT: Giving up resources due to failure of 
 IPv6addr::2001:db8:38a5:8::2006/48/eth1
 ResourceManager[22129]: info: Running /etc/ha.d/resource.d/IPv6addr 
 2001:db8:38a5:8 2006/48/eth1 stop
 ResourceManager[22129]: info: Retrying failed stop operation 
 [IPv6addr::2001:db8:38a5:8::2006/48/eth1]
 
 Apparently there is a conflict between the characters '::' inside
 the IPv6 address and the separator '::' used in the haresources. But
 I would not like have to expand the IPv6 address.
 
 Does anyone know a way to avoid this conflict?

You can't have it all ;-)

I see several options.
 - use 2001:db8:38a5:8:0:0:0:2006/48/eth1
 - abandon haresource
 - hack the ResourceManager script of heartbeat,
   allow for escaping, or special case IPv6addr or similar...
   it's plain shell after all
 - hack the resource.d/IPv6addr *wrapper* script only,
   to mangle the input parameters.

The last two options would look something like below.
You need only *one* of these, though using both would not hurt.
Untested, and likely whitespace mangled ;-)

--- ResourceManager
+++ ResourceManager
@@ -167,6 +167,11 @@ resource2script() {
 # multiple arguments are separated by :: delimiters
 resource2arg() {
   case `canonname $1` in
+IPv6addr::*)
+   # special case, there is only one argument,
+   # and it contains ::
+   echo $1 | sed 's%[^:]*::%%'
+   ;;
 *::*)  echo $1 | sed 's%[^:]*::%%' | sed 's%::% %g'
;;
   esac

--- IPv6addr
+++ IPv6addr
@@ -17,6 +17,8 @@ usage() {
 exit 1
 }

+[ $# = 3 ]  set -- $1::$2 $3
+
 if [ $# != 2 ]; then
 usage
 fi


Cheers,

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat 'ERROR' messages

2013-05-28 Thread Greg Woods
I know it's tacky to reply to myself, but I can answer one of my
questions after another 15 minutes or so of poring through logs:

On Tue, 2013-05-28 at 10:37 -0600, Greg Woods wrote:

 
 The questions are what do these messages actually mean, why is one
 cluster logging them and not the other, and is this something I should
 be worried about?

The answer to the last one is that this is definitely a problem, because
after nearly half an hour, this is logged:

May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] :
[src=vmx1.ucar.edu]
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] :
[(1)srcuuid=0x5ceb390(36 27)]
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] :
[seq=3a4]
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] :
[hg=4c97c17a]
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] :
[ts=51a13888]
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] :
[ld=0.50 0.33 0.28 3/316 13859]
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[10] :
[ttl=3]
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[11] :
[auth=1 feb94da356847a538290ea75f27423c996c0a595]
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5689]: ERROR: write_child:
Exiting due to persistent errors: No such device
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: WARN: Managed HBWRITE
process 5689 exited with return code 1.
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: ERROR: HBWRITE process
died.  Beginning communications restart process for comm channel 1.
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: info: glib: UDP
Broadcast heartbeat closed on port 694 interface eth4 - Status: 1
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: WARN: Managed HBREAD
process 5690 killed by signal 9 [SIGKILL - Kill, unblockable].
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: ERROR: Both comm
processes for channel 1 have died.  Restarting.
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: info: glib: UDP
Broadcast heartbeat started on port 694 (694) interface eth4
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: info: glib: UDP
Broadcast heartbeat closed on port 694 interface eth4 - Status: 1
May 25 16:17:44 vmx1.ucar.edu heartbeat: [5683]: info: Communications
restart succeeded.
May 25 16:17:45 vmx1.ucar.edu heartbeat: [5683]: info: Link
vmx2.ucar.edu:eth4 up.

And VMs stop being reachable, etc. The only way to stabilize things is
to not start heartbeat on one of the nodes (vmx1 arbitrarily chosen) and
run all resources on a single node (vmx2 in this case).

--Greg


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat 'ERROR' messages

2013-05-28 Thread Andrew Beekhof

On 29/05/2013, at 2:37 AM, Greg Woods wo...@ucar.edu wrote:

 I have two clusters that are both running CentOS 5.6 and
 heartbeat-3.0.3-2.3.el5 (from the clusterlabs repo). THey are running
 slightly different pacemaker versions (pacemaker-1.0.9.1-1.15.el5 on the
 first one and pacemaker-1.0.12-1.el5 on the other) They both have
 identical ha.cf files except that the bcast device names are different
 (and they are correct for each case, I checked), like this:
 
 udpport 694
 bcast eth2
 bcast eth1
 use_logd off
 logfile /var/log/halog
 debugfile /var/log/hadebug
 debug 1
 keepalive 2
 deadtime 15
 initdead 60
 node vmd1.ucar.edu
 node vmd2.ucar.edu
 auto_failback off
 respawn hacluster /usr/lib64/heartbeat/ipfail
 crm respawn

I don't know about the rest, but definitely do not use both ipfail and crm.
Pick one :)

 
 On one of them (which maybe or maybe not coincidentally is having some
 problems), I get these messages logged about every 2 seconds
 in /var/log/halog, on the other I don't see them:
 
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG: Dumping
 message with 10 fields
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[0] :
 [t=NS_ackmsg]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[1] :
 [dest=vmx2.ucar.edu]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[2] :
 [ackseq=3a0]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[3] :
 [(1)destuuid=0x5ceb280(37 28)]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] :
 [src=vmx1.ucar.edu]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] :
 [(1)srcuuid=0x5ceb390(36 27)]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] :
 [hg=4c97c17a]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] :
 [ts=51a13435]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] : [ttl=3]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] : [auth=1
 23b556bcb61a08abecf87cb6411c62e62cf99f0d]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG: Dumping
 message with 12 fields
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[0] :
 [t=status]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[1] :
 [st=active]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[2] :
 [dt=3a98]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[3] :
 [protocol=1]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[4] :
 [src=vmx1.ucar.edu]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[5] :
 [(1)srcuuid=0x5ceb390(36 27)]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[6] :
 [seq=17b]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[7] :
 [hg=4c97c17a]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[8] :
 [ts=51a13435]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[9] :
 [ld=0.27 0.41 0.26 1/315 19183]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[10] :
 [ttl=3]
 May 25 15:59:17 vmx1.ucar.edu heartbeat: [5689]: ERROR: MSG[11] :
 [auth=1 3d3da4df831636f7c274395041ffb49bbf215170]
 
 The questions are what do these messages actually mean, why is one
 cluster logging them and not the other, and is this something I should
 be worried about?
 
 Thanks for any info,
 --Greg
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat 'ERROR' messages

2013-05-28 Thread Greg Woods
On Wed, 2013-05-29 at 07:50 +1000, Andrew Beekhof wrote:

  respawn hacluster /usr/lib64/heartbeat/ipfail
  crm respawn
 
 I don't know about the rest, but definitely do not use both ipfail and crm.
 Pick one :)

I guess I will have to look into what ipfail really does. I have a half
dozen clusters that have virtually the same ha.cf files and they have
been running for 2+ years with it specified this way.

--Greg


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat 'ERROR' messages

2013-05-28 Thread Andrew Beekhof

On 29/05/2013, at 8:05 AM, Greg Woods wo...@ucar.edu wrote:

 On Wed, 2013-05-29 at 07:50 +1000, Andrew Beekhof wrote:
 
 respawn hacluster /usr/lib64/heartbeat/ipfail
 crm respawn
 
 I don't know about the rest, but definitely do not use both ipfail and crm.
 Pick one :)
 
 I guess I will have to look into what ipfail really does.

With crm enabled, nothing.
Try 
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html/Pacemaker_Explained/_moving_resources_due_to_connectivity_changes.html

 I have a half
 dozen clusters that have virtually the same ha.cf files and they have
 been running for 2+ years with it specified this way.
 
 --Greg
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-25 Thread Keisuke MORI
Hi Nick,

Could you privide which version of resource-agents you're using?

Prior to 3.9.2, IPv6addr requires a static IPv6 address with the
exactly same prefix to find out an apropriate nic; so you should have
statically assigned   2600:3c00::34:c003/116 on eth0 for example.

As of 3.9.3, it has relaxed and the specified nic is always used no
matter if the prefix does not match; so it should just work. (at least
it works for me)

Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a
virtual IPv6 address, which is brand new and I would prefer this
because it uses the standard ip command.

Thanks,

2013/3/25 Nick Walke tubaguy50...@gmail.com:
 This the correct place to report bugs?
 https://github.com/ClusterLabs/resource-agents

 Nick


 On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.dewrote:

 Hello Nick,

  I shouldn't be able to do that if the IPv6 module wasn't loaded,
  correct?

 that is correct. I tried modifying my netmask to copy yours. And I get
 the same error, you do:

 ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete):
 unknown error

 So probably a bug in the resource agent. Manually adding and removing
 works:

 (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0
 (node-62) [~] ip -6 addr show dev eth0
 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
 inet6 2a01:4f8:bb:400::2/116 scope global
valid_lft forever preferred_lft forever
 inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
valid_lft 2591887sec preferred_lft 604687sec
 inet6 fe80::225:90ff:fe97:dbb0/64 scope link
valid_lft forever preferred_lft forever
 (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0

 Nick, you can do the following things to resolve this:

 - Hunt down the bug and fix it or let someone else do it for you

 - Use another netmask, if possible (fighting the symptoms instead
 of
   resolving the root cause)

 - Write your own resource agent (fighting the symptoms instead of
   resolving the root cause)

 Cheers,
 Thomas
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems



-- 
Keisuke MORI
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-25 Thread Nick Walke
Looks like 3.9.2-5.  So I need to statically assign the address I want to
use before using it with IPv6addr?
On Mar 25, 2013 3:44 AM, Keisuke MORI keisuke.mori...@gmail.com wrote:

 Hi Nick,

 Could you privide which version of resource-agents you're using?

 Prior to 3.9.2, IPv6addr requires a static IPv6 address with the
 exactly same prefix to find out an apropriate nic; so you should have
 statically assigned   2600:3c00::34:c003/116 on eth0 for example.

 As of 3.9.3, it has relaxed and the specified nic is always used no
 matter if the prefix does not match; so it should just work. (at least
 it works for me)

 Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a
 virtual IPv6 address, which is brand new and I would prefer this
 because it uses the standard ip command.

 Thanks,

 2013/3/25 Nick Walke tubaguy50...@gmail.com:
  This the correct place to report bugs?
  https://github.com/ClusterLabs/resource-agents
 
  Nick
 
 
  On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.de
 wrote:
 
  Hello Nick,
 
   I shouldn't be able to do that if the IPv6 module wasn't loaded,
   correct?
 
  that is correct. I tried modifying my netmask to copy yours. And I get
  the same error, you do:
 
  ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete):
  unknown error
 
  So probably a bug in the resource agent. Manually adding and removing
  works:
 
  (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0
  (node-62) [~] ip -6 addr show dev eth0
  2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
  inet6 2a01:4f8:bb:400::2/116 scope global
 valid_lft forever preferred_lft forever
  inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
 valid_lft 2591887sec preferred_lft 604687sec
  inet6 fe80::225:90ff:fe97:dbb0/64 scope link
 valid_lft forever preferred_lft forever
  (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0
 
  Nick, you can do the following things to resolve this:
 
  - Hunt down the bug and fix it or let someone else do it for you
 
  - Use another netmask, if possible (fighting the symptoms
 instead
  of
resolving the root cause)
 
  - Write your own resource agent (fighting the symptoms instead
 of
resolving the root cause)
 
  Cheers,
  Thomas
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems



 --
 Keisuke MORI
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-25 Thread Keisuke MORI
2013/3/25 Nick Walke tubaguy50...@gmail.com:
 Looks like 3.9.2-5.  So I need to statically assign the address I want to
 use before using it with IPv6addr?

Yes.


 On Mar 25, 2013 3:44 AM, Keisuke MORI keisuke.mori...@gmail.com wrote:

 Hi Nick,

 Could you privide which version of resource-agents you're using?

 Prior to 3.9.2, IPv6addr requires a static IPv6 address with the
 exactly same prefix to find out an apropriate nic; so you should have
 statically assigned   2600:3c00::34:c003/116 on eth0 for example.

 As of 3.9.3, it has relaxed and the specified nic is always used no
 matter if the prefix does not match; so it should just work. (at least
 it works for me)

 Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a
 virtual IPv6 address, which is brand new and I would prefer this
 because it uses the standard ip command.

 Thanks,

 2013/3/25 Nick Walke tubaguy50...@gmail.com:
  This the correct place to report bugs?
  https://github.com/ClusterLabs/resource-agents
 
  Nick
 
 
  On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.de
 wrote:
 
  Hello Nick,
 
   I shouldn't be able to do that if the IPv6 module wasn't loaded,
   correct?
 
  that is correct. I tried modifying my netmask to copy yours. And I get
  the same error, you do:
 
  ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete):
  unknown error
 
  So probably a bug in the resource agent. Manually adding and removing
  works:
 
  (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0
  (node-62) [~] ip -6 addr show dev eth0
  2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
  inet6 2a01:4f8:bb:400::2/116 scope global
 valid_lft forever preferred_lft forever
  inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
 valid_lft 2591887sec preferred_lft 604687sec
  inet6 fe80::225:90ff:fe97:dbb0/64 scope link
 valid_lft forever preferred_lft forever
  (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0
 
  Nick, you can do the following things to resolve this:
 
  - Hunt down the bug and fix it or let someone else do it for you
 
  - Use another netmask, if possible (fighting the symptoms
 instead
  of
resolving the root cause)
 
  - Write your own resource agent (fighting the symptoms instead
 of
resolving the root cause)
 
  Cheers,
  Thomas
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems



 --
 Keisuke MORI
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems



-- 
Keisuke MORI
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-25 Thread Nick Walke
That works.  Thanks!

Nick


On Mon, Mar 25, 2013 at 4:22 AM, Keisuke MORI keisuke.mori...@gmail.comwrote:

 2013/3/25 Nick Walke tubaguy50...@gmail.com:
  Looks like 3.9.2-5.  So I need to statically assign the address I want to
  use before using it with IPv6addr?

 Yes.


  On Mar 25, 2013 3:44 AM, Keisuke MORI keisuke.mori...@gmail.com
 wrote:
 
  Hi Nick,
 
  Could you privide which version of resource-agents you're using?
 
  Prior to 3.9.2, IPv6addr requires a static IPv6 address with the
  exactly same prefix to find out an apropriate nic; so you should have
  statically assigned   2600:3c00::34:c003/116 on eth0 for example.
 
  As of 3.9.3, it has relaxed and the specified nic is always used no
  matter if the prefix does not match; so it should just work. (at least
  it works for me)
 
  Alternatively, as of 3.9.5, you can also use IPaddr2 for managing a
  virtual IPv6 address, which is brand new and I would prefer this
  because it uses the standard ip command.
 
  Thanks,
 
  2013/3/25 Nick Walke tubaguy50...@gmail.com:
   This the correct place to report bugs?
   https://github.com/ClusterLabs/resource-agents
  
   Nick
  
  
   On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann 
 tho...@glanzmann.de
  wrote:
  
   Hello Nick,
  
I shouldn't be able to do that if the IPv6 module wasn't loaded,
correct?
  
   that is correct. I tried modifying my netmask to copy yours. And I
 get
   the same error, you do:
  
   ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete):
   unknown error
  
   So probably a bug in the resource agent. Manually adding and removing
   works:
  
   (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0
   (node-62) [~] ip -6 addr show dev eth0
   2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
   inet6 2a01:4f8:bb:400::2/116 scope global
  valid_lft forever preferred_lft forever
   inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
  valid_lft 2591887sec preferred_lft 604687sec
   inet6 fe80::225:90ff:fe97:dbb0/64 scope link
  valid_lft forever preferred_lft forever
   (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0
  
   Nick, you can do the following things to resolve this:
  
   - Hunt down the bug and fix it or let someone else do it for
 you
  
   - Use another netmask, if possible (fighting the symptoms
  instead
   of
 resolving the root cause)
  
   - Write your own resource agent (fighting the symptoms
 instead
  of
 resolving the root cause)
  
   Cheers,
   Thomas
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
  
   ___
   Linux-HA mailing list
   Linux-HA@lists.linux-ha.org
   http://lists.linux-ha.org/mailman/listinfo/linux-ha
   See also: http://linux-ha.org/ReportingProblems
 
 
 
  --
  Keisuke MORI
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems



 --
 Keisuke MORI
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Thomas Glanzmann
Hello,

 ipv6addr=2600:3c00::0034:c007

from the manpage of ocf_heartbeat_IPv6addr it looks like that you have
to specify the netmask so try:

ipv6addr=2600:3c00::0034:c007/64 assuiming that you're in a /64.

Cheers,
Thomas
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Nick Walke
Thanks for the tip, however, it did not work.  That's actually a /116.  So
I put in 2600:3c00::0034:c007/116 and am getting the same error.  I
requested that it restart the resource as well, just to make sure it wasn't
the previous error.

Nick


On Sun, Mar 24, 2013 at 3:55 AM, Thomas Glanzmann tho...@glanzmann.dewrote:

 Hello,

  ipv6addr=2600:3c00::0034:c007

 from the manpage of ocf_heartbeat_IPv6addr it looks like that you have
 to specify the netmask so try:

 ipv6addr=2600:3c00::0034:c007/64 assuiming that you're in a /64.

 Cheers,
 Thomas
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread emmanuel segura
Hello Nick

Try to use nic=eth0 instead of nic=eth0:3

thanks

2013/3/24 Nick Walke tubaguy50...@gmail.com

 Thanks for the tip, however, it did not work.  That's actually a /116.  So
 I put in 2600:3c00::0034:c007/116 and am getting the same error.  I
 requested that it restart the resource as well, just to make sure it wasn't
 the previous error.

 Nick


 On Sun, Mar 24, 2013 at 3:55 AM, Thomas Glanzmann tho...@glanzmann.de
 wrote:

  Hello,
 
   ipv6addr=2600:3c00::0034:c007
 
  from the manpage of ocf_heartbeat_IPv6addr it looks like that you have
  to specify the netmask so try:
 
  ipv6addr=2600:3c00::0034:c007/64 assuiming that you're in a /64.
 
  Cheers,
  Thomas
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




-- 
esta es mi vida e me la vivo hasta que dios quiera
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Thomas Glanzmann
Hello Nick,

 Thanks for the tip, however, it did not work.  That's actually a /116.
 So I put in 2600:3c00::0034:c007/116 and am getting the same
 error.  I requested that it restart the resource as well, just to make
 sure it wasn't the previous error.

now, I had to try it:

node $id=9d9b62d2-405d-459a-a724-cb2643d7d9a1 node-62
primitive ipv6test ocf:heartbeat:IPv6addr \
params ipv6addr=2a01:4f8:bb:400::2/64 \
op monitor interval=15 timeout=15 \
meta target-role=Started
property $id=cib-bootstrap-options \
dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \
cluster-infrastructure=Heartbeat \
stonith-enabled=false

And it works:

(node-62) [~] ifconfig
eth0  Link encap:Ethernet  HWaddr 00:25:90:97:db:b0
  inet addr:10.100.4.62  Bcast:10.100.255.255  Mask:255.255.0.0
  inet6 addr: 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 Scope:Global
  inet6 addr: fe80::225:90ff:fe97:dbb0/64 Scope:Link
  inet6 addr: 2a01:4f8:bb:400::2/64 Scope:Global
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:40345 errors:0 dropped:0 overruns:0 frame:0
  TX packets:10270 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:52540127 (50.1 MiB)  TX bytes:1127817 (1.0 MiB)
  Memory:fb58-fb60

(infra) [~] traceroute 2a01:4f8:bb:400::2
traceroute to 2a01:4f8:bb:400::2 (2a01:4f8:bb:400::2), 30 hops max, 80 byte 
packets
 1  merlin.glanzmann.de (2a01:4f8:bb:4ff::1)  1.413 ms  1.550 ms  1.791 ms
 2  2a01:4f8:bb:400::2 (2a01:4f8:bb:400::2)  0.204 ms  0.202 ms  0.270 ms

Cheers,
Thomas
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Greg Woods
On Sun, 2013-03-24 at 01:36 -0700, tubaguy50035 wrote:

 params ipv6addr=2600:3c00::0034:c007 nic=eth0:3 \

Are you sure that's a valid IPV6 address? I get headaches every time I
look at these, but it seems a valid address is 8 groups, and you've got
5 there. Maybe you mean 2600:3c00::0034:c007?

--Greg

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Nick Walke
I don't know what I'm doing wrong then.  I copied exactly what you put in
and now I'm getting these errors:

ipv6test_start_0 (node=tek-lin-lb1, call=25, rc=1, status=complete):
unknown error
ipv6test_start_0 (node=tek-lin-lb2, call=20, rc=1, status=complete):
unknown error

Looking in my syslog I see:

Mar 24 14:37:13 tek-lin-lb2 IPv6addr: [8038]: ERROR: no valid mecahnisms
Mar 24 14:37:13 tek-lin-lb2 lrmd: [3005]: info: operation start[18] on
ipv6test for client 3008: pid 8038 exited with return code 1
Mar 24 14:37:13 tek-lin-lb2 crmd: [3008]: info: process_lrm_event: LRM
operation ipv6test_start_0 (call=18, rc=1, cib-update=65, confirmed=true)
unknown error

Anything I need to do to allow IPv6... or something?



Nick


On Sun, Mar 24, 2013 at 4:29 AM, Thomas Glanzmann tho...@glanzmann.dewrote:

 Hello Nick,

  Thanks for the tip, however, it did not work.  That's actually a /116.
  So I put in 2600:3c00::0034:c007/116 and am getting the same
  error.  I requested that it restart the resource as well, just to make
  sure it wasn't the previous error.

 now, I had to try it:

 node $id=9d9b62d2-405d-459a-a724-cb2643d7d9a1 node-62
 primitive ipv6test ocf:heartbeat:IPv6addr \
 params ipv6addr=2a01:4f8:bb:400::2/64 \
 op monitor interval=15 timeout=15 \
 meta target-role=Started
 property $id=cib-bootstrap-options \
 dc-version=1.1.7-ee0730e13d124c3d58f00016c3376a1de5323cff \
 cluster-infrastructure=Heartbeat \
 stonith-enabled=false

 And it works:

 (node-62) [~] ifconfig
 eth0  Link encap:Ethernet  HWaddr 00:25:90:97:db:b0
   inet addr:10.100.4.62  Bcast:10.100.255.255  Mask:255.255.0.0
   inet6 addr: 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 Scope:Global
   inet6 addr: fe80::225:90ff:fe97:dbb0/64 Scope:Link
   inet6 addr: 2a01:4f8:bb:400::2/64 Scope:Global
   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
   RX packets:40345 errors:0 dropped:0 overruns:0 frame:0
   TX packets:10270 errors:0 dropped:0 overruns:0 carrier:0
   collisions:0 txqueuelen:1000
   RX bytes:52540127 (50.1 MiB)  TX bytes:1127817 (1.0 MiB)
   Memory:fb58-fb60

 (infra) [~] traceroute 2a01:4f8:bb:400::2
 traceroute to 2a01:4f8:bb:400::2 (2a01:4f8:bb:400::2), 30 hops max, 80
 byte packets
  1  merlin.glanzmann.de (2a01:4f8:bb:4ff::1)  1.413 ms  1.550 ms  1.791 ms
  2  2a01:4f8:bb:400::2 (2a01:4f8:bb:400::2)  0.204 ms  0.202 ms  0.270 ms

 Cheers,
 Thomas
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Thomas Glanzmann
Hello Nick,

 Anything I need to do to allow IPv6... or something?

I agree with Greg here. Have you tried setting the address manually?

ip -6 addr add ip/cidr dev eth0
ip -6 addr show dev eth0
ip -6 addr del ip/cidr dev eth0
ip -6 addr show dev eth0

(node-62) [~] ip -6 addr add 2a01:4f8:bb:400::3/64 dev eth0
(node-62) [~] ip -6 addr show dev eth0
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
inet6 2a01:4f8:bb:400::3/64 scope global
   valid_lft forever preferred_lft forever
inet6 2a01:4f8:bb:400::2/64 scope global
   valid_lft forever preferred_lft forever
inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
   valid_lft 2591998sec preferred_lft 604798sec
inet6 fe80::225:90ff:fe97:dbb0/64 scope link
   valid_lft forever preferred_lft forever
(node-62) [~] ip -6 addr del 2a01:4f8:bb:400::3/64 dev eth0
(node-62) [~] ip -6 addr show dev eth0
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
inet6 2a01:4f8:bb:400::2/64 scope global
   valid_lft forever preferred_lft forever
inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
   valid_lft 2591990sec preferred_lft 604790sec
inet6 fe80::225:90ff:fe97:dbb0/64 scope link
   valid_lft forever preferred_lft forever

Do you see a link local address on your eth0? A link local address is one that
starts with fe80:: otherwise try loading the ipv6 module:

modprobe ipv6 # Don't know if that is the right module name, all my
  # kernels have ipv6 build in (Debian wheezy / squeeze / 
backports)

Cheers,
Thomas
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Nick Walke
From the first node:

nick@tek-lin-lb1:~$ sudo ip -6 addr add 2600:3c00::34:c007/116 dev eth0

nick@tek-lin-lb1:~$ sudo ip -6 addr show dev eth0
3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
inet6 2600:3c00::34:c007/116 scope global
   valid_lft forever preferred_lft forever
inet6 2600:3c00::f03c:91ff:fe70:7541/64 scope global dynamic
   valid_lft 43200sec preferred_lft 43200sec
inet6 2600:3c00::34:c003/64 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::f03c:91ff:fe70:7541/64 scope link
   valid_lft forever preferred_lft forever

nick@tek-lin-lb1:~$ sudo ip -6 addr del 2600:3c00::34:c007/116 dev eth0

nick@tek-lin-lb1:~$ sudo ip -6 addr show dev eth0
3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
inet6 2600:3c00::f03c:91ff:fe70:7541/64 scope global dynamic
   valid_lft 43200sec preferred_lft 43200sec
inet6 2600:3c00::34:c003/64 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::f03c:91ff:fe70:7541/64 scope link
   valid_lft forever preferred_lft forever



From the second node:

nick@tek-lin-lb2:~$ sudo ip -6 addr add 2600:3c00::34:c007/116 dev eth0

nick@tek-lin-lb2:~$ sudo ip -6 addr show dev eth0
3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
inet6 2600:3c00::34:c007/116 scope global
   valid_lft forever preferred_lft forever
inet6 2600:3c00::f03c:91ff:fe70:f0a4/64 scope global dynamic
   valid_lft 43190sec preferred_lft 43190sec
inet6 2600:3c00::34:c005/64 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::f03c:91ff:fe70:f0a4/64 scope link
   valid_lft forever preferred_lft forever

nick@tek-lin-lb2:~$ sudo ip -6 addr del 2600:3c00::34:c007/116 dev eth0

nick@tek-lin-lb2:~$ sudo ip -6 addr show dev eth0
3: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
inet6 2600:3c00::f03c:91ff:fe70:f0a4/64 scope global dynamic
   valid_lft 43197sec preferred_lft 43197sec
inet6 2600:3c00::34:c005/64 scope global
   valid_lft forever preferred_lft forever
inet6 fe80::f03c:91ff:fe70:f0a4/64 scope link
   valid_lft forever preferred_lft forever

I shouldn't be able to do that if the IPv6 module wasn't loaded, correct?
 So it seems like it is.



Nick


On Sun, Mar 24, 2013 at 3:16 PM, Thomas Glanzmann tho...@glanzmann.dewrote:

 Hello Nick,

  Anything I need to do to allow IPv6... or something?

 I agree with Greg here. Have you tried setting the address manually?

 ip -6 addr add ip/cidr dev eth0
 ip -6 addr show dev eth0
 ip -6 addr del ip/cidr dev eth0
 ip -6 addr show dev eth0

 (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::3/64 dev eth0
 (node-62) [~] ip -6 addr show dev eth0
 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
 inet6 2a01:4f8:bb:400::3/64 scope global
valid_lft forever preferred_lft forever
 inet6 2a01:4f8:bb:400::2/64 scope global
valid_lft forever preferred_lft forever
 inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
valid_lft 2591998sec preferred_lft 604798sec
 inet6 fe80::225:90ff:fe97:dbb0/64 scope link
valid_lft forever preferred_lft forever
 (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::3/64 dev eth0
 (node-62) [~] ip -6 addr show dev eth0
 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
 inet6 2a01:4f8:bb:400::2/64 scope global
valid_lft forever preferred_lft forever
 inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
valid_lft 2591990sec preferred_lft 604790sec
 inet6 fe80::225:90ff:fe97:dbb0/64 scope link
valid_lft forever preferred_lft forever

 Do you see a link local address on your eth0? A link local address is one
 that
 starts with fe80:: otherwise try loading the ipv6 module:

 modprobe ipv6 # Don't know if that is the right module name, all my
   # kernels have ipv6 build in (Debian wheezy /
 squeeze / backports)

 Cheers,
 Thomas
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Thomas Glanzmann
Hello Nick,

 I shouldn't be able to do that if the IPv6 module wasn't loaded,
 correct?

that is correct. I tried modifying my netmask to copy yours. And I get
the same error, you do:

ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete): unknown 
error

So probably a bug in the resource agent. Manually adding and removing
works:

(node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0
(node-62) [~] ip -6 addr show dev eth0
2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
inet6 2a01:4f8:bb:400::2/116 scope global
   valid_lft forever preferred_lft forever
inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
   valid_lft 2591887sec preferred_lft 604687sec
inet6 fe80::225:90ff:fe97:dbb0/64 scope link
   valid_lft forever preferred_lft forever
(node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0

Nick, you can do the following things to resolve this:

- Hunt down the bug and fix it or let someone else do it for you

- Use another netmask, if possible (fighting the symptoms instead of
  resolving the root cause)

- Write your own resource agent (fighting the symptoms instead of
  resolving the root cause)

Cheers,
Thomas
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat IPv6addr OCF

2013-03-24 Thread Nick Walke
This the correct place to report bugs?
https://github.com/ClusterLabs/resource-agents

Nick


On Sun, Mar 24, 2013 at 10:45 PM, Thomas Glanzmann tho...@glanzmann.dewrote:

 Hello Nick,

  I shouldn't be able to do that if the IPv6 module wasn't loaded,
  correct?

 that is correct. I tried modifying my netmask to copy yours. And I get
 the same error, you do:

 ipv6test_start_0 (node=node-62, call=6, rc=1, status=complete):
 unknown error

 So probably a bug in the resource agent. Manually adding and removing
 works:

 (node-62) [~] ip -6 addr add 2a01:4f8:bb:400::2/116 dev eth0
 (node-62) [~] ip -6 addr show dev eth0
 2: eth0: BROADCAST,MULTICAST,UP,LOWER_UP mtu 1500 qlen 1000
 inet6 2a01:4f8:bb:400::2/116 scope global
valid_lft forever preferred_lft forever
 inet6 2a01:4f8:bb:400:225:90ff:fe97:dbb0/64 scope global dynamic
valid_lft 2591887sec preferred_lft 604687sec
 inet6 fe80::225:90ff:fe97:dbb0/64 scope link
valid_lft forever preferred_lft forever
 (node-62) [~] ip -6 addr del 2a01:4f8:bb:400::2/116 dev eth0

 Nick, you can do the following things to resolve this:

 - Hunt down the bug and fix it or let someone else do it for you

 - Use another netmask, if possible (fighting the symptoms instead
 of
   resolving the root cause)

 - Write your own resource agent (fighting the symptoms instead of
   resolving the root cause)

 Cheers,
 Thomas
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect

2012-12-13 Thread Dejan Muhamedagic
On Thu, Dec 13, 2012 at 12:41:29AM +0100, Lars Marowsky-Bree wrote:
 On 2012-12-13T10:31:55, Andrew Beekhof and...@beekhof.net wrote:
 
   We once moved the ocf-shellfuncs file, which didn't work out here when
  I thought we never did this sort of thing because we don't know how
  people are using our stuff externally.
 
 We did it in a backwards-compatible manner; or at least if the packagers
 choose to, they could symlink the old location to the new one. (That is
 the default for the included spec files, I think.)

Right. Whichever way some other RA may have used the shellfuncs,
it would continue to work with the new package. That obviously
needed to be supported. The old filenames were starting with '.'
which is a precedent and not well received by all distributions.

Thanks,

Dejan

 So yes, we try hard to never break updating. And to provide migration
 over several releases. None of the functions changed names, all
 variables still there, we don't drop agent attributes, that kind of
 stuff.
 
 But copying a new agent that has the new path embedded obviously doesn't
 work in the old environment.
 
 If you were trying to be snarky, I think this failed. ;-)
 
 
 Regards,
 Lars
 
 -- 
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect

2012-12-12 Thread Michael Schwartzkopff
Am Montag, 10. Dezember 2012, 15:59:16 schrieb codey koble:
 To anyone who could help possibly:
 
 My current setup:
 2 Ubuntu 10.04 LTS servers running heartbeat, pacemaker, apache, and mysql
 Heartbeat and pacemaker are running great for my needs with one exception,
 currently both nodes are showing mysql as slaves.
 I have mysql configured in a master/slave setup and that is working great
 on its own.
 
 I noticed when I tried to promote one of the servers that an error occurred
 stating that the ocf:heartbeat:mysql did not support the feature.  I
 evaluated the script and realized it was an older version and did not
 contain any of the promote/demote code.  I found the newest code for the
 script in the github repo and replaced the entire mysql file with the new
 code.  Upon doing this it then gave an error stating that the
 ocf:heartbeat:mysql resource agent was not installed.
 
 My question would be is there a simple way to update the script instead of
 manually replacing it like I did, or is there a way to get the code I
 changed to working?
 
 Thanks in advance for any help!

It seems that you have three options:

1) Go back to the old script and use it as a primitive resource, not a 
Master/Slave resource.

2) Keep the new script and debug, why the new script does not work in your 
environment. Perhaps some PATH is set wrong or some packages are not 
installed.

3) Upgrade to 12.04 LTS. This version should reflect recent developments in 
the cluster coftware.

Perhaps you try option 2) first but in the mid term go for option 3).

-- 
Dr. Michael Schwartzkopff
Guardinistr. 63
81375 München

Tel: (0163) 172 50 98
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect

2012-12-12 Thread Fabian Herschel
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 12/10/2012 10:59 PM, codey koble wrote:
 To anyone who could help possibly:
 
 My current setup: 2 Ubuntu 10.04 LTS servers running heartbeat,
 pacemaker, apache, and mysql Heartbeat and pacemaker are running
 great for my needs with one exception, currently both nodes are
 showing mysql as slaves. I have mysql configured in a master/slave
 setup and that is working great on its own.
 
 I noticed when I tried to promote one of the servers that an error
 occurred stating that the ocf:heartbeat:mysql did not support the
 feature.  I evaluated the script and realized it was an older
 version and did not contain any of the promote/demote code.  I
 found the newest code for the script in the github repo and
 replaced the entire mysql file with the new code.  Upon doing this
 it then gave an error stating that the ocf:heartbeat:mysql resource
 agent was not installed.

Could you send the error message more precise? Does the cluster tell
you the RA si not installed  (check path and file permissions) or does
the LRM tell that the RA itself has returned a exit code not
installed (this would mean the RA does not find your mysql
binaries/config/or whatever)?

 
 My question would be is there a simple way to update the script
 instead of manually replacing it like I did, or is there a way to
 get the code I changed to working?
 
 Thanks in advance for any help! 
 ___ Linux-HA mailing
 list Linux-HA@lists.linux-ha.org 
 http://lists.linux-ha.org/mailman/listinfo/linux-ha See also:
 http://linux-ha.org/ReportingProblems
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.19 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://www.enigmail.net/

iQEcBAEBAgAGBQJQyH9xAAoJEJ1uHhrzMvZRjW4H/RUxkgL/nXyKZqz6xl8dDn3P
bPcCqqOvSX2x32umwkEaS2JZ7Gabo8O7sHIZNC/HcrmDttoRo6L4BNR+W2QkQtMV
FEuTVqktOq6WdeaZ2Hn66S42+IkzHOOJRRJzp0GSLfdlxzRiM2E+an/QmPwWbpZZ
EFvZbyDScqrKyQo7vN5CE0K1yb9JCrOxLMO2NX1D2reiOv7f3pvslKO03eohLcy/
k4ZagdO9GvIPs7PPj+pI5aUYbH7ypejPR+z8e6OXpAgbfSQg7AJuTgllMcCsODAe
BEb78ZWpa4pANAugRvJZ87A1ATjgJy2MBubyewqGRqghnNeqAjq5hPgzH9cuWoQ=
=OfyW
-END PGP SIGNATURE-
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect

2012-12-12 Thread Lars Marowsky-Bree
On 2012-12-12T13:58:25, Fabian Herschel fabian.hersc...@arcor.de wrote:

  I noticed when I tried to promote one of the servers that an error
  occurred stating that the ocf:heartbeat:mysql did not support the
  feature.  I evaluated the script and realized it was an older
  version and did not contain any of the promote/demote code.  I
  found the newest code for the script in the github repo and
  replaced the entire mysql file with the new code.  Upon doing this
  it then gave an error stating that the ocf:heartbeat:mysql resource
  agent was not installed.
 Could you send the error message more precise? Does the cluster tell
 you the RA si not installed  (check path and file permissions) or does
 the LRM tell that the RA itself has returned a exit code not
 installed (this would mean the RA does not find your mysql
 binaries/config/or whatever)?

We once moved the ocf-shellfuncs file, which didn't work out here when
only a single script is updated and not the whole package.

I suggest to upgrade the whole package and then investigate.


Mit freundlichen Grüßen,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect

2012-12-12 Thread Andrew Beekhof
On Thu, Dec 13, 2012 at 12:24 AM, Lars Marowsky-Bree l...@suse.com wrote:
 On 2012-12-12T13:58:25, Fabian Herschel fabian.hersc...@arcor.de wrote:

  I noticed when I tried to promote one of the servers that an error
  occurred stating that the ocf:heartbeat:mysql did not support the
  feature.  I evaluated the script and realized it was an older
  version and did not contain any of the promote/demote code.  I
  found the newest code for the script in the github repo and
  replaced the entire mysql file with the new code.  Upon doing this
  it then gave an error stating that the ocf:heartbeat:mysql resource
  agent was not installed.
 Could you send the error message more precise? Does the cluster tell
 you the RA si not installed  (check path and file permissions) or does
 the LRM tell that the RA itself has returned a exit code not
 installed (this would mean the RA does not find your mysql
 binaries/config/or whatever)?

 We once moved the ocf-shellfuncs file, which didn't work out here when

I thought we never did this sort of thing because we don't know how
people are using our stuff externally.

 only a single script is updated and not the whole package.

 I suggest to upgrade the whole package and then investigate.


 Mit freundlichen Grüßen,
 Lars

 --
 Architect Storage/HA
 SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, 
 HRB 21284 (AG Nürnberg)
 Experience is the name everyone gives to their mistakes. -- Oscar Wilde

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat/Pacemaker resource agent incorrect

2012-12-12 Thread Lars Marowsky-Bree
On 2012-12-13T10:31:55, Andrew Beekhof and...@beekhof.net wrote:

  We once moved the ocf-shellfuncs file, which didn't work out here when
 I thought we never did this sort of thing because we don't know how
 people are using our stuff externally.

We did it in a backwards-compatible manner; or at least if the packagers
choose to, they could symlink the old location to the new one. (That is
the default for the included spec files, I think.)

So yes, we try hard to never break updating. And to provide migration
over several releases. None of the functions changed names, all
variables still there, we don't drop agent attributes, that kind of
stuff.

But copying a new agent that has the new path embedded obviously doesn't
work in the old environment.

If you were trying to be snarky, I think this failed. ;-)


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat with Oracle's ASM

2012-11-15 Thread Digimer
On 11/15/2012 05:00 AM, Hill Fang wrote:
 Hi friend:
 
 I want know heartbeat is support oracle ASM now??

The heartbeat project has been deprecated for some time. There are no
plans to continue it's development. I am unsure of it's supported state
on Oracle, but regardless, I would advice you plan to use corosync.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without
access to education?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat with Oracle's ASM

2012-11-15 Thread Serge Dubrouski
There is an RA for Oracle that can be used with Pacemaker. Generally ASM
behaves like a regular Oracle instance, so you can try it.
 On Nov 15, 2012 8:57 AM, Hill Fang hill.f...@ericsson.com wrote:

 Hi friend:

 I want know heartbeat is support oracle ASM now??




 HILL FANG
 Engineer

 Guangzhou Ericsson Communication Services Co.,Ltd.(GTC)
 SI Support
 2 /F, NO. 1025 Gaopu Road, Tianhe Software Park,Tianhe District, Guangzhou,
 510663, PR China
 Phone +86 020-85117631
 Fax +86 020-29002699
 SMS/MMS 15813329521
 hill.f...@ericsson.com
 www.ericsson.com


 [http://www.ericsson.com/]http://www.ericsson.com/

 This Communication is Confidential. We only send and receive email on the
 basis of the terms set out at www.ericsson.com/email_disclaimer
 http://www.ericsson.com/email_disclaimer


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat with Oracle's ASM

2012-11-15 Thread Lars Marowsky-Bree
On 2012-11-15T10:00:21, Hill Fang hill.f...@ericsson.com wrote:

 Hi friend:
 
 I want know heartbeat is support oracle ASM now??

No - and yes.

Oracle RAC (I assume that's the context for ASM?) does not tolerate any
cluster solution except itself. This is not supported together with
Pacemaker.

Pacemaker with the Oracle resource agent can manage a single instance
fail-over for Oracle, yes. That is supported. Postgres/MySQL too.


Regards,
Lars

-- 
Architect Storage/HA
SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 
21284 (AG Nürnberg)
Experience is the name everyone gives to their mistakes. -- Oscar Wilde

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat not starting when both nodes are down

2012-10-09 Thread Nicolás
El 08/10/2012 20:56, Andreas Kurz escribió:
 On 10/08/2012 09:42 PM, Nicolás wrote:
 El 28/09/2012 20:42, Nicolás escribió:
 Hi all!

 I'm new to this list, I've been looking to get some info about this but
 I haven't seen anything, so I'm trying this way.

 I've successfully configured a 2-node cluster with DRBD + Heartbeat +
 Pacemaker. It works as expected.

 The problem comes when both nodes are down. Having this, after powering
 on one of the nodes, I can see it configuring the network but after this
 I never see the console for this machine. So I try to connect via SSH
 and realize that Heartbeat is not running. After I run it manually I can
 see the console for this node. This only happens when BOTH nodes are
 down. When just one is, everything goes right as Heartbeat starts
 automatically on the powering-on node.

 I see nothing relevant in logs, my conf is as follows:

 root@cluster1:~# cat /etc/ha.d/ha.cf | grep -e ^[^#]
 logfacility local0
 ucast eth1 192.168.0.91
 ucast eth0 192.168.20.51
 auto_failback on
 nodecluster1.gamez.es cluster2.gamez.es
 use_logd yes
 crm  on
 autojoin none

 Any ideas on what am I doing wrong?
 [...]

 For a new cluster use Corosync and not Heartbeat,disable DRBD init
 script and configure it as a Pacemaker master-slave resource.


Thanks for this! Once I disabled DRBD init script it worked as it should.

Regards,

Nicolás

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat not starting when both nodes are down

2012-10-08 Thread Nicolás
El 28/09/2012 20:42, Nicolás escribió:
 Hi all!

 I'm new to this list, I've been looking to get some info about this but
 I haven't seen anything, so I'm trying this way.

 I've successfully configured a 2-node cluster with DRBD + Heartbeat +
 Pacemaker. It works as expected.

 The problem comes when both nodes are down. Having this, after powering
 on one of the nodes, I can see it configuring the network but after this
 I never see the console for this machine. So I try to connect via SSH
 and realize that Heartbeat is not running. After I run it manually I can
 see the console for this node. This only happens when BOTH nodes are
 down. When just one is, everything goes right as Heartbeat starts
 automatically on the powering-on node.

 I see nothing relevant in logs, my conf is as follows:

 root@cluster1:~# cat /etc/ha.d/ha.cf | grep -e ^[^#]
 logfacility local0
 ucast eth1 192.168.0.91
 ucast eth0 192.168.20.51
 auto_failback on
 nodecluster1.gamez.es cluster2.gamez.es
 use_logd yes
 crm  on
 autojoin none

 Any ideas on what am I doing wrong?

 Thanks a lot in advance.

 Nicolás
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

Any ideas with this?

Thanks!
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat not starting when both nodes are down

2012-10-08 Thread Andreas Kurz
On 10/08/2012 09:42 PM, Nicolás wrote:
 El 28/09/2012 20:42, Nicolás escribió:
 Hi all!

 I'm new to this list, I've been looking to get some info about this but
 I haven't seen anything, so I'm trying this way.

 I've successfully configured a 2-node cluster with DRBD + Heartbeat +
 Pacemaker. It works as expected.

 The problem comes when both nodes are down. Having this, after powering
 on one of the nodes, I can see it configuring the network but after this
 I never see the console for this machine. So I try to connect via SSH
 and realize that Heartbeat is not running. After I run it manually I can
 see the console for this node. This only happens when BOTH nodes are
 down. When just one is, everything goes right as Heartbeat starts
 automatically on the powering-on node.

 I see nothing relevant in logs, my conf is as follows:

 root@cluster1:~# cat /etc/ha.d/ha.cf | grep -e ^[^#]
 logfacility local0
 ucast eth1 192.168.0.91
 ucast eth0 192.168.20.51
 auto_failback on
 nodecluster1.gamez.es cluster2.gamez.es
 use_logd yes
 crm  on
 autojoin none

 Any ideas on what am I doing wrong?

Looks like enabled DRBD init script with default startup-timeout
parameters ... that script blocks until peer is connected or timeout --
default forever (depending on some configuration parameters) or manual
confirmation on console ... as heartbeat is typically last in boot
process it is not (yet) started.

For a new cluster use Corosync and not Heartbeat,disable DRBD init
script and configure it as a Pacemaker master-slave resource.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now


 Thanks a lot in advance.

 Nicolás
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
 Any ideas with this?
 
 Thanks!
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 







signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat and n-to1 clusters

2012-08-07 Thread Andrew Beekhof
On Tue, Aug 7, 2012 at 1:42 AM, Andy Furtado awf...@yahoo.com wrote:
 Hello,


 Is it possible to setup an n-to-1 cluster configuration and have heartbeat 
 manage a different VIP for each virtual pair.
 The n-to-1 configuration would have a single slave node, able to take over 
 for any one of the failed N masters at a time.

You'll want pacemaker on top of heartbeat for that.

   http://www.clusterlabs.org

 In this configuration each masternode would have a staticip-addr and a VIP. 
 When the master fails, the VIP for that master is configured on the 
 slavenode, and the slave acts as that master.
 Once the slavenode is acting as a master, it remains in this state and cannot 
 takeover of another failed master until the original masternode is restored, 
 and the original slave transitions back to the slave state.


 Example configuration
 masternodeA
 staticip:10.1.1.1
 VIP:10.1.1.101


 masternodeB
 staticip:10.1.1.2

 VIP: 10.1.1.102


 slavenode
 staticip:10.1.1.3


 If masternodeA fails, slavenode becomes active as masternodeA and is 
 configured with VIP 10.1.1.101
 If masternodeB fails, there is no failover available since slavenode is 
 currently acting as masternodeA


 When masternodeA is restored, slavenode releases VIP 10.1.1.101, and is now 
 ready to take over for either masternodeA or masternodeB.


 I understand this is not an ideal fail over solution, but one I must live 
 with until further design can be done.

 I've searched the internet, and the HA mailing lists without much success.

 Any info, or input would be appreciated.


 Best Regards,
 Andy
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Error

2012-08-05 Thread Andrew Beekhof
On Fri, Aug 3, 2012 at 5:18 PM, Yount, William D
yount.will...@menloworldwide.com wrote:
 I am using pacemaker and corosync. For some reason I keep getting this error 
 in my messages log:

 ERROR: Cannot chdir to [/var/lib/heartbeat/cores/root]: No such file or 
 directory

 Should I not worry about that since I am using corosync and not heartbeat

Pacemaker (until a few days ago) used these directories even when used
with corosync.
Best to create it.



 William

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Error [Solved]

2012-08-05 Thread Andrew Beekhof
More recent versions will create the leaf directory for you when
pacemaker starts.

On Fri, Aug 3, 2012 at 5:39 PM, Yount, William D
yount.will...@menloworldwide.com wrote:
 I was able to fix the error by creating the directory manually. 
 /var/lib/heartbeat/cores was already there, I just added root.

 Kind of an odd problem though.


 -Original Message-
 From: linux-ha-boun...@lists.linux-ha.org 
 [mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Yount, William D
 Sent: Friday, August 03, 2012 2:18 AM
 To: linux-ha@lists.linux-ha.org
 Subject: [Linux-HA] Heartbeat Error

 I am using pacemaker and corosync. For some reason I keep getting this error 
 in my messages log:

 ERROR: Cannot chdir to [/var/lib/heartbeat/cores/root]: No such file or 
 directory

 Should I not worry about that since I am using corosync and not heartbeat


 William

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Error [Solved]

2012-08-03 Thread Yount, William D
I was able to fix the error by creating the directory manually. 
/var/lib/heartbeat/cores was already there, I just added root.

Kind of an odd problem though.


-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Yount, William D
Sent: Friday, August 03, 2012 2:18 AM
To: linux-ha@lists.linux-ha.org
Subject: [Linux-HA] Heartbeat Error

I am using pacemaker and corosync. For some reason I keep getting this error in 
my messages log:

ERROR: Cannot chdir to [/var/lib/heartbeat/cores/root]: No such file or 
directory

Should I not worry about that since I am using corosync and not heartbeat


William

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat isn't switching to the 2nd node when Httpd is down!

2012-07-31 Thread Lars Ellenberg
On Tue, Jul 24, 2012 at 04:01:40PM +0100, Aboubakr Seddik Ouahabi wrote:
 Hey there, I've created a thread somewhere, but I guess this is the right
 place to seek help for this, and here is my issue as stated there:
 
 
 Ok guys, that was very much appreciated and I thank you again. For now, I
 just want to get heartbeat to function as it should and I don't want to
 create a whole new thread for it.
 
 As I said before, I have one public IP to access the server, and 2 nodes
 with 2 internal IPs, both are connected using Eth0, and what I want exactly
 is, if either one of Httpd or MySQL went down, the second node should take
 control and the virtual IP shall be assigned to it, until everything is in
 Sync again, then the primary or the favorible node should be taking over
 again.
 
 Heartbeat is starting just fine, detecting the 2 nodes, then I tried to
 shutdown one of them and see what would it say
 
  Code:
 
 cl_status nodestatus node02
 dead
 
 And it found it was dead, but the failover isn't happening. I've tried to:
 
  Code:
 
 service httpd stop
 
 On node01, but it didn't switch anything to anything, so what I've been
 missing in my config? And here are the config I've tried in my ha.cf:
 
  Code:
 
 # Logging
  debug  1
  use_logd   true
  logfacilitydaemon
 
  # Misc Options
  traditional_compressionoff
  compressionbz2
  coredumps  true
 
  # Communications
  udpport21xxx
  bcast  eth0
  ucast  eth010.25.45.81
  ucast  eth010.25.45.82
 
  autojoin   any
 
  # Thresholds (in seconds)
  keepalive  1
  warntime   6
  deadtime   10
  initdead   15
 
 crm respawn
 
 node node01
 node node02
 
 And I've tried 2 combinations for my cib.xml:

learn to use the crm shell, so much easier to the eyes...

 
 1:
  Code:
 
 cib
 configuration

I think you are missing no-quorum-policy=ignore


-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat isn't switching to the 2nd node when Httpd is down!

2012-07-29 Thread Andrew Beekhof
On Wed, Jul 25, 2012 at 1:01 AM, Aboubakr Seddik Ouahabi
ouaha...@gmail.com wrote:
 Hey there, I've created a thread somewhere, but I guess this is the right
 place to seek help for this, and here is my issue as stated there:


 Ok guys, that was very much appreciated and I thank you again. For now, I
 just want to get heartbeat to function as it should and I don't want to
 create a whole new thread for it.

 As I said before, I have one public IP to access the server, and 2 nodes
 with 2 internal IPs, both are connected using Eth0, and what I want exactly
 is, if either one of Httpd or MySQL went down, the second node should take
 control and the virtual IP shall be assigned to it,

Is apache and mysql intended to be running on both machines at the same time?

Btw. haresources is not used for crm/pacemaker clusters

 until everything is in
 Sync again, then the primary or the favorible node should be taking over
 again.

 Heartbeat is starting just fine, detecting the 2 nodes, then I tried to
 shutdown one of them and see what would it say

  Code:

 cl_status nodestatus node02
 dead

 And it found it was dead, but the failover isn't happening. I've tried to:

  Code:

 service httpd stop

 On node01, but it didn't switch anything to anything, so what I've been
 missing in my config? And here are the config I've tried in my ha.cf:

  Code:

 # Logging
  debug  1
  use_logd   true
  logfacilitydaemon

  # Misc Options
  traditional_compressionoff
  compressionbz2
  coredumps  true

  # Communications
  udpport21xxx
  bcast  eth0
  ucast  eth010.25.45.81
  ucast  eth010.25.45.82

  autojoin   any

  # Thresholds (in seconds)
  keepalive  1
  warntime   6
  deadtime   10
  initdead   15

 crm respawn

 node node01
 node node02

 And I've tried 2 combinations for my cib.xml:

 1:
  Code:

 cib
 configuration

   crm_config/
   nodes/
   resources

 ###
 ###

group id=group_apache
  primitive id=ipaddr class=ocf type=IPaddr 
 provider=heartbeat
  instance_attributes id=ia_ipaddr
attributes
  nvpair id=ia_ipaddr_ip name=ip value=91.xxx.xxx.xx/
  nvpair id=ia_ipaddr_nic name=nic value=eth0/
  nvpair id=ia_ipaddr_netmask name=netmask value=24/
/attributes
  /instance_attributes
/primitive
primitive id=apache class=ocf type=apache provider=heartbeat
  instance_attributes id=ia_apache
attributes
  nvpair id=ia_apache_configfile name=configfile
 value=/etc/httpd/conf/httpd.conf/
/attributes
  /instance_attributes
  /primitive
/group

 #
 #

 group id=node01

   primitive class=ocf id=IP1 provider=heartbeat type=IPaddr
 operations
   op id=IP1_mon interval=10s name=monitor timeout=5s/
 /operations
 instance_attributes id=IP1_inst_attr
   attributes
 nvpair id=IP1_attr_0 name=ip value=10.25.45.81/
 nvpair id=IP1_attr_1 name=netmask value=255.255.255.0/
 nvpair id=IP1_attr_2 name=nic value=eth0/
   /attributes
 /instance_attributes
   /primitive

   primitive class=lsb id=httpd1 provider=heartbeat type=httpd
 operations
   op id=jboss1_mon interval=30s name=monitor timeout=20s/
 /operations
   /primitive
 /group

 group id=node02

   primitive class=ocf id=IP2 provider=heartbeat type=IPaddr
 operations
   op id=IP2_mon interval=10s name=monitor timeout=5s/
 /operations
 instance_attributes id=IP2_inst_attr
   attributes
 nvpair id=IP2_attr_0 name=ip value=10.25.45.82/
 nvpair id=IP2_attr_1 name=netmask value=255.255.255.0/
 nvpair id=IP2_attr_2 name=nic value=eth0/
   /attributes
 /instance_attributes
   /primitive
   primitive class=lsb id=httpd2 provider=heartbeat type=httpd
 operations
   op id=jboss2_mon interval=30s name=monitor timeout=20s/
 /operations
   /primitive
 /group
   /resources

   constraints
 rsc_location id=location_server1 rsc=node01
   rule id=best_location_server1 score=100
 expression_attribute=node01 id=best_location_server1_expr
 operation=eq
 value=10.25.45.81/
   /rule
 /rsc_location

 rsc_location id=location_server2 rsc=node02
   rule id=best_location_server2 score=100
 expression_attribute=node02 id=best_location_server2_expr
 operation=eq
 

Re: [Linux-HA] Heartbeat over VPN

2012-07-12 Thread Dejan Muhamedagic
Hi,

On Wed, Jul 11, 2012 at 04:24:42AM +0700, Nanang Purnomo wrote:
 I want to implement a failover cluster server with heartbeat, but the
 problem I use vpn network. Is the heartbeat can be run through two
 different networks?

Sure. Just make sure that the port is open and that various
parameters fit your network.

Now, if it's a two-node cluster, you need a stonith solution
which runs over another independent media. If that's not
possible, you'll need an arbitrator in the third site.

Thanks,

Dejan

 I hope you give me solution,please
 
 
 Best Regards,
 Nanang
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat question about multiple services

2012-05-09 Thread Nikita Michalko
Am Freitag, 20. April 2012 12:42:16 schrieb sgm:
 Hi,
 I have a question about heartbeat, if I have three services, apache, mysql
  and sendmail,if apache is down, heartbeat will switch all the services to
  the standby server, right?
It's depending on configuration - also possible ...

  If so, how to configure heartbeat to avoid this
  happen?

You can configure your 2 services (mysql and sendmail for example )  with 
colocations  constraints, or as a group - there are many possibilities.
Did you already RTFM (read the f... manuals)?


 Very Appreciated.gm
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 

HTH

Nikita Michalko 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat question about multiple services

2012-05-09 Thread RaSca
Il giorno Ven 20 Apr 2012 12:42:16 CEST, sgm ha scritto:
 Hi,
 I have a question about heartbeat, if I have three services, apache, mysql 
 and sendmail,if apache is down, heartbeat will switch all the services to the 
 standby server, right?
 If so, how to configure heartbeat to avoid this happen?
 Very Appreciated.gm

You may want to start from here:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch/

-- 
RaSca
Mia Mamma Usa Linux: Niente è impossibile da capire, se lo spieghi bene!
ra...@miamammausalinux.org
http://www.miamammausalinux.org
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat question about multiple services

2012-05-09 Thread David Gersic
 On 4/20/2012 at 05:42 AM, sgm sgm...@yahoo.com.cn wrote: 
 Hi,
 I have a question about heartbeat, if I have three services, apache, mysql 
 and sendmail,if apache is down, heartbeat will switch all the services to the 
 standby server, right?

Maybe. It depends on how you have built and configured your cluster.



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat strange behavior

2012-05-07 Thread Douglas Pasqua
Thanks Lars..

problem solved. I changed the asterisk init script to be idempotent.

Regards,
Douglas

On Wed, May 2, 2012 at 9:25 AM, Lars Ellenberg lars.ellenb...@linbit.comwrote:

 On Mon, Apr 30, 2012 at 01:52:05PM -0300, Douglas Pasqua wrote:
  Hi friends,
 
  I create a linux ha solution using 2 nodes: node-a and node-b.
 
  My /etc/ha.d/ha.cf:
 
  use_logd yes
  keepalive 1
  deadtime 90
  warntime 5
  initdead 120
  bcast eth6
  node node-a
  node node-b
  crm off
  auto_failback off
 
  My /etc/ha.d/haresources
  node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 service1 service2 service3
 
  I booted the two nodes together. node-a become master and node-b become
  slave. After, I booted the node-a. Then node-b become master. When node-a
  return from boot, it become slave, because *auto_failback is off* i
 think.
  All as expected until here.
 
  As the node-a as a slave, I decide to halt the node-a (using halt
 command).
  Then heartbeat in node-b go standby and my cluster was down. The virtual
  ips was down too. I expected the node-b stay on. Why did this happen ?
 
  Some log from node2:
 
  Apr 30 00:02:57 node-b heartbeat: [3082]: info: Received shutdown notice
  from 'node-a'.
  Apr 30 00:02:57 node-b heartbeat: [3082]: info: Resources being acquired
  from node-a.
  Apr 30 00:02:57 node-b heartbeat: [4414]: debug: notify_world: setting
  SIGCHLD Handler to SIG_DFL
  Apr 30 00:02:57 node-b harc[4414]: [4428]: info: Running
  /etc/ha.d/rc.d/status status
  Apr 30 00:02:57 node-b heartbeat: [4416]: info: No local resources
  [/usr/share/heartbeat/ResourceManager listkeys node-b] to acquire.
  Apr 30 00:02:57 node-b heartbeat: [3082]: debug: StartNextRemoteRscReq():
  child count 1
 
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4657]: debug:
  /etc/init.d/asterisk  start done. RC=1
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4658]: ERROR: Return code
 1
  from /etc/init.d/asterisk
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4659]: CRIT: Giving up
  resources due to failure of asterisk

 Because of the above error when starting asterisk.  Maybe your asterisk
 init script is simply not idempotent.  Maybe it is broken, or maybe
 there really was some problem trying to start asterisk.


  Apr 30 00:02:58 node-b ResourceManager[4462]: [4660]: info: Releasing
  resource group: node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 asterisk
  sincronismo notificacao
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4670]: info: Running
  /etc/init.d/notificacao  stop
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4671]: debug: Starting
  /etc/init.d/notificacao  stop
 
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4694]: debug:
  /etc/init.d/notificacao  stop done. RC=0
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4704]: info: Running
  /etc/init.d/sincronismo  stop
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4705]: debug: Starting
  /etc/init.d/sincronismo  stop
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4711]: debug:
  /etc/init.d/sincronismo  stop done. RC=0
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4720]: info: Running
  /etc/init.d/asterisk  stop
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4721]: debug: Starting
  /etc/init.d/asterisk  stop
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4725]: debug:
  /etc/init.d/asterisk  stop done. RC=0
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4741]: info: Running
  /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop
  Apr 30 00:02:58 node-b ResourceManager[4462]: [4742]: debug: Starting
  /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop
 
  Apr 30 00:03:29 node-b heartbeat: [3082]: info: node-b wants to go
 standby
  [foreign]
  Apr 30 00:03:39 node-b heartbeat: [3082]: WARN: No reply to standby
  request.  Standby request cancelled.
  Apr 30 00:04:29 node-b heartbeat: [3082]: WARN: node node-a: is dead
  Apr 30 00:04:29 node-b heartbeat: [3082]: info: Dead node node-a gave up
  resources.
  Apr 30 00:04:29 node-b heartbeat: [3082]: info: Link node-a:eth6 dead.

 --
 : Lars Ellenberg
 : LINBIT | Your Way to High Availability
 : DRBD/HA support and consulting http://www.linbit.com

 DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat strange behavior

2012-05-02 Thread Lars Ellenberg
On Mon, Apr 30, 2012 at 01:52:05PM -0300, Douglas Pasqua wrote:
 Hi friends,
 
 I create a linux ha solution using 2 nodes: node-a and node-b.
 
 My /etc/ha.d/ha.cf:
 
 use_logd yes
 keepalive 1
 deadtime 90
 warntime 5
 initdead 120
 bcast eth6
 node node-a
 node node-b
 crm off
 auto_failback off
 
 My /etc/ha.d/haresources
 node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 service1 service2 service3
 
 I booted the two nodes together. node-a become master and node-b become
 slave. After, I booted the node-a. Then node-b become master. When node-a
 return from boot, it become slave, because *auto_failback is off* i think.
 All as expected until here.
 
 As the node-a as a slave, I decide to halt the node-a (using halt command).
 Then heartbeat in node-b go standby and my cluster was down. The virtual
 ips was down too. I expected the node-b stay on. Why did this happen ?
 
 Some log from node2:
 
 Apr 30 00:02:57 node-b heartbeat: [3082]: info: Received shutdown notice
 from 'node-a'.
 Apr 30 00:02:57 node-b heartbeat: [3082]: info: Resources being acquired
 from node-a.
 Apr 30 00:02:57 node-b heartbeat: [4414]: debug: notify_world: setting
 SIGCHLD Handler to SIG_DFL
 Apr 30 00:02:57 node-b harc[4414]: [4428]: info: Running
 /etc/ha.d/rc.d/status status
 Apr 30 00:02:57 node-b heartbeat: [4416]: info: No local resources
 [/usr/share/heartbeat/ResourceManager listkeys node-b] to acquire.
 Apr 30 00:02:57 node-b heartbeat: [3082]: debug: StartNextRemoteRscReq():
 child count 1
 
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4657]: debug:
 /etc/init.d/asterisk  start done. RC=1
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4658]: ERROR: Return code 1
 from /etc/init.d/asterisk
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4659]: CRIT: Giving up
 resources due to failure of asterisk

Because of the above error when starting asterisk.  Maybe your asterisk
init script is simply not idempotent.  Maybe it is broken, or maybe
there really was some problem trying to start asterisk.


 Apr 30 00:02:58 node-b ResourceManager[4462]: [4660]: info: Releasing
 resource group: node-a x.x.x.x/24 x.x.x.x/24 x.x.x.x/24 asterisk
 sincronismo notificacao
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4670]: info: Running
 /etc/init.d/notificacao  stop
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4671]: debug: Starting
 /etc/init.d/notificacao  stop
 
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4694]: debug:
 /etc/init.d/notificacao  stop done. RC=0
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4704]: info: Running
 /etc/init.d/sincronismo  stop
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4705]: debug: Starting
 /etc/init.d/sincronismo  stop
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4711]: debug:
 /etc/init.d/sincronismo  stop done. RC=0
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4720]: info: Running
 /etc/init.d/asterisk  stop
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4721]: debug: Starting
 /etc/init.d/asterisk  stop
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4725]: debug:
 /etc/init.d/asterisk  stop done. RC=0
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4741]: info: Running
 /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop
 Apr 30 00:02:58 node-b ResourceManager[4462]: [4742]: debug: Starting
 /etc/ha.d/resource.d/IPaddr x.x.x.x/24 stop
 
 Apr 30 00:03:29 node-b heartbeat: [3082]: info: node-b wants to go standby
 [foreign]
 Apr 30 00:03:39 node-b heartbeat: [3082]: WARN: No reply to standby
 request.  Standby request cancelled.
 Apr 30 00:04:29 node-b heartbeat: [3082]: WARN: node node-a: is dead
 Apr 30 00:04:29 node-b heartbeat: [3082]: info: Dead node node-a gave up
 resources.
 Apr 30 00:04:29 node-b heartbeat: [3082]: info: Link node-a:eth6 dead.

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Failover Configuration Question

2012-04-23 Thread Marcus Bointon
On 23 Apr 2012, at 02:23, Net Warrior wrote:

 auto_failback on

No. As far as I'm aware this is to control what happens when your initial node 
recovers. If you have 2 nodes, a and b, and a is active, but then fails, b will 
take over, but when a is fixed and recovers, heartbeat will 'fail back' to a 
automatically if this property is on. You might want this if a is a 
faster/better server.

Marcus
-- 
Marcus Bointon
Synchromedia Limited: Creators of http://www.smartmessages.net/
UK info@hand CRM solutions
mar...@synchromedia.co.uk | http://www.synchromedia.co.uk/



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Failover Configuration Question

2012-04-23 Thread Nikita Michalko
Hi, Net Warrior!


What version of HA/Pacemaker do you use?
Did you already RTFM - e.g. 
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained
- or:
http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch


HTH


Nikita Michalko 

Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior:
 Hi There
 
 I configured heartbeat to failover an IP address  , if I for example
 shutdown one node, the other takes it's ip address, so far so good, now
 my doubt is if there is a way to configure it not to make the failover
 automatically and have someone run the failover manually, can you provide
 any configuration example please? is this stanza the one that does the
 magic?
 
 auto_failback on
 
 
 Thanks for your time and support
 Best regards
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Failover Configuration Question

2012-04-23 Thread Net Warrior
Hi Nikita

This is the version
heartbeat-3.0.0-0.7

My aim is to, if node1 is powered off or losts it's ethernet
connection,. node2 wont make the failover automatically,  I want to
make it manually, but could not find how to accomplish that.


Thanks for your time and support
Best regards



2012/4/23, Nikita Michalko michalko.sys...@a-i-p.com:
 Hi, Net Warrior!


 What version of HA/Pacemaker do you use?
 Did you already RTFM - e.g.
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained
 - or:
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch


 HTH


 Nikita Michalko

 Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior:
 Hi There

 I configured heartbeat to failover an IP address  , if I for example
 shutdown one node, the other takes it's ip address, so far so good, now
 my doubt is if there is a way to configure it not to make the failover
 automatically and have someone run the failover manually, can you provide
 any configuration example please? is this stanza the one that does the
 magic?

 auto_failback on


 Thanks for your time and support
 Best regards
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Failover Configuration Question

2012-04-23 Thread David Coulson
Why even use heartbeat then - Just manually ifconfig the interface.

On 4/23/12 7:39 AM, Net Warrior wrote:
 Hi Nikita

 This is the version
 heartbeat-3.0.0-0.7

 My aim is to, if node1 is powered off or losts it's ethernet
 connection,. node2 wont make the failover automatically,  I want to
 make it manually, but could not find how to accomplish that.


 Thanks for your time and support
 Best regards



 2012/4/23, Nikita Michalkomichalko.sys...@a-i-p.com:
 Hi, Net Warrior!


 What version of HA/Pacemaker do you use?
 Did you already RTFM - e.g.
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained
 - or:
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch


 HTH


 Nikita Michalko

 Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior:
 Hi There

 I configured heartbeat to failover an IP address  , if I for example
 shutdown one node, the other takes it's ip address, so far so good, now
 my doubt is if there is a way to configure it not to make the failover
 automatically and have someone run the failover manually, can you provide
 any configuration example please? is this stanza the one that does the
 magic?

 auto_failback on


 Thanks for your time and support
 Best regards
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Failover Configuration Question

2012-04-23 Thread Net Warrior
True, but even on the most expensive software likve Veritas Cluster or
Red Hat Cluster I can configure how I want to failover the resources (
auto or manual ), that's why my curiosity to acomplish the same in
here.

Thanks for your time
Best Regards

2012/4/23, David Coulson da...@davidcoulson.net:
 Why even use heartbeat then - Just manually ifconfig the interface.

 On 4/23/12 7:39 AM, Net Warrior wrote:
 Hi Nikita

 This is the version
 heartbeat-3.0.0-0.7

 My aim is to, if node1 is powered off or losts it's ethernet
 connection,. node2 wont make the failover automatically,  I want to
 make it manually, but could not find how to accomplish that.


 Thanks for your time and support
 Best regards



 2012/4/23, Nikita Michalkomichalko.sys...@a-i-p.com:
 Hi, Net Warrior!


 What version of HA/Pacemaker do you use?
 Did you already RTFM - e.g.
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained
 - or:
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch


 HTH


 Nikita Michalko

 Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior:
 Hi There

 I configured heartbeat to failover an IP address  , if I for example
 shutdown one node, the other takes it's ip address, so far so good, now
 my doubt is if there is a way to configure it not to make the failover
 automatically and have someone run the failover manually, can you
 provide
 any configuration example please? is this stanza the one that does the
 magic?

 auto_failback on


 Thanks for your time and support
 Best regards
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Failover Configuration Question

2012-04-23 Thread Andreas Kurz
On 04/23/2012 01:47 PM, Net Warrior wrote:
 True, but even on the most expensive software likve Veritas Cluster or
 Red Hat Cluster I can configure how I want to failover the resources (
 auto or manual ), that's why my curiosity to acomplish the same in
 here.

with the help of the meat-ware stonith plugin a manual acknowledge of
the failover process is required.

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

 
 Thanks for your time
 Best Regards
 
 2012/4/23, David Coulson da...@davidcoulson.net:
 Why even use heartbeat then - Just manually ifconfig the interface.

 On 4/23/12 7:39 AM, Net Warrior wrote:
 Hi Nikita

 This is the version
 heartbeat-3.0.0-0.7

 My aim is to, if node1 is powered off or losts it's ethernet
 connection,. node2 wont make the failover automatically,  I want to
 make it manually, but could not find how to accomplish that.


 Thanks for your time and support
 Best regards



 2012/4/23, Nikita Michalkomichalko.sys...@a-i-p.com:
 Hi, Net Warrior!


 What version of HA/Pacemaker do you use?
 Did you already RTFM - e.g.
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained
 - or:
 http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Clusters_from_Scratch


 HTH


 Nikita Michalko

 Am Montag, 23. April 2012 02:23:20 schrieb Net Warrior:
 Hi There

 I configured heartbeat to failover an IP address  , if I for example
 shutdown one node, the other takes it's ip address, so far so good, now
 my doubt is if there is a way to configure it not to make the failover
 automatically and have someone run the failover manually, can you
 provide
 any configuration example please? is this stanza the one that does the
 magic?

 auto_failback on


 Thanks for your time and support
 Best regards
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems




signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat doesnt create the socket /var/run/heartbeat/register

2012-01-22 Thread Efrat Lefeber
The heartbeat I install is from debian packages.
dpkg -l | grep  heartbeat
ii  heartbeat  1:3.0.3-2~bpo50+1  Subsystem 
for High-Availability Linux
ii  libheartbeat2   1:3.0.3-2~bpo50+1  Subsystem 
for High-Availability Linux (libraries)

version 3.0.2

I install the same packages and builds on all devices. I have an automatic 
installation. Some devices are installed ok and some suffers from the problem 
that the socket isn't created.
Is there a way I can create the socket from outside heartbeat (from perl or 
bash)? I have a watchdog and I wish to create the socket automatically in case 
the socket doesn't exist.

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of Lars Ellenberg
Sent: Friday, January 20, 2012 8:48 PM
To: linux-ha@lists.linux-ha.org
Subject: Re: [Linux-HA] heartbeat doesnt create the socket 
/var/run/heartbeat/register

On Thu, Jan 19, 2012 at 02:18:53PM +, Efrat Lefeber wrote:
 Hi,
 
 I am using linux-ha heartbeat on a two simple nodes cluster.
 For some reason which I can't figure out, the socket 
 /var/run/heartbeat/register is not created though the directory 
 /var/run/heartbeat/ exist:
 
 ll /var/run/heartbeat/
 total 24
 drwxr-x---  6 hacluster haclient 4096 2012-01-19 14:30 .
 drwxr-xr-x 16 root  root 4096 2012-01-19 14:30 ..
 drwxr-x---  2 hacluster haclient 4096 2012-01-19 14:30 ccm
 drwxr-x---  2 hacluster haclient 4096 2012-01-19 14:30 crm
 drwxr-x---  2 hacluster haclient 4096 2012-01-19 14:30 dopd
 drwxr-xr-t  2 root  root 4096 2012-01-19 14:30 rsctmp
 
 
 /etc/init.d/heartbeat status
 heartbeat OK [pid 14685 et al] is running on vs-158 [vs-158]...
 
 cl_status hbstatus
 Heartbeat is stopped on this machine.
 
 I ran cl_status with strace and I saw this error:
 connect(3, {sa_family=AF_FILE, path=/var/run/heartbeat/register...}, 
 110) = -1 ENOENT (No such file or directory)
 
 
 Who created this socket?

That's one of the first things the heartbeat binary does when it starts, If it 
can not create that socket, heartbeat will not even start up.

Of course, in theory someone may remove that socket after it was created. If 
so, make sure that does not happen again ;)

 How can I find out why isn't the socket created?

Where did you get your packages/binaries?
Double check your build?
lsof -n -p your heartbeat master control process?

 Is there a workaround I can do to create the socket?

Fix your installation.

 This problem doesn't happen all the time. I have another node with the 
 same configuration and the socket was created there.

Same packages and build?

--
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com 
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
#
Scanned by MailMarshal - M86 Security's comprehensive email content security 
solution. 
Download a free evaluation of MailMarshal at www.m86security.com
#
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat doesnt create the socket /var/run/heartbeat/register

2012-01-20 Thread Lars Ellenberg
On Thu, Jan 19, 2012 at 02:18:53PM +, Efrat Lefeber wrote:
 Hi,
 
 I am using linux-ha heartbeat on a two simple nodes cluster.
 For some reason which I can't figure out, the socket 
 /var/run/heartbeat/register is not created though the directory 
 /var/run/heartbeat/ exist:
 
 ll /var/run/heartbeat/
 total 24
 drwxr-x---  6 hacluster haclient 4096 2012-01-19 14:30 .
 drwxr-xr-x 16 root  root 4096 2012-01-19 14:30 ..
 drwxr-x---  2 hacluster haclient 4096 2012-01-19 14:30 ccm
 drwxr-x---  2 hacluster haclient 4096 2012-01-19 14:30 crm
 drwxr-x---  2 hacluster haclient 4096 2012-01-19 14:30 dopd
 drwxr-xr-t  2 root  root 4096 2012-01-19 14:30 rsctmp
 
 
 /etc/init.d/heartbeat status
 heartbeat OK [pid 14685 et al] is running on vs-158 [vs-158]...
 
 cl_status hbstatus
 Heartbeat is stopped on this machine.
 
 I ran cl_status with strace and I saw this error:
 connect(3, {sa_family=AF_FILE, path=/var/run/heartbeat/register...}, 110) = 
 -1 ENOENT (No such file or directory)
 
 
 Who created this socket?

That's one of the first things the heartbeat binary does when it starts,
If it can not create that socket, heartbeat will not even start up.

Of course, in theory someone may remove that socket after it was
created. If so, make sure that does not happen again ;)

 How can I find out why isn't the socket created?

Where did you get your packages/binaries?
Double check your build?
lsof -n -p your heartbeat master control process?

 Is there a workaround I can do to create the socket?

Fix your installation.

 This problem doesn't happen all the time. I have another node with the
 same configuration and the socket was created there.

Same packages and build?

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] [Heartbeat][Pacemaker] VIP doesn't swith to other server

2011-11-18 Thread Andreas Kurz
Hello Mathieu,

On 11/17/2011 07:22 PM, SEILLIER Mathieu wrote:
 Hi all,
 
 I have to use Heartbeat with Pacemaker for High Availability between 2 Tomcat 
 5.5 servers under Linux RedHat 5.4.
 The first server is active, the other one is passive. The master is called 
 servappli01, with IP address 186.20.100.81, the slave is called servappli02, 
 with IP address 186.20.100.82.
 I configured a virtual IP 186.20.100.83. Each Tomcat is not launched when 
 server is started, this is Heartbeat which starts Tomcat when it's running.
 All seem to be OK, each server see the other as active, and the crm_mon 
 command shows this below :
 
 
 Last updated: Thu Nov 17 19:03:34 2011
 Stack: Heartbeat
 Current DC: servappli01 (bf8e9a46-8691-4838-82d9-942a13aeedca) - partition 
 with quorum
 Version: 1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87
 2 Nodes configured, 2 expected votes
 2 Resources configured.
 
 
 Online: [ servappli01 servappli02 ]
 
  Clone Set: ClusterIPClone (unique)
  ClusterIP:0(ocf::heartbeat:IPaddr2):   Started servappli01
  ClusterIP:1(ocf::heartbeat:IPaddr2):   Started servappli02

Your did not only configured a simple VIP but a cluster IP which acts
like a simple static loadbalancer ... man iptables ... search for CLUSTERIP.

If this was not your intention, simply don't clone it.

If you want a clusterip you have to choose correct meta attributes:

clone ClusterIPClone ClusterIP \
meta globally-unique=true clone-node-max=2 interleave=true

  Clone Set: TomcatClone (unique)
  Tomcat:0   (ocf::heartbeat:tomcat):Started servappli01
  Tomcat:1   (ocf::heartbeat:tomcat):Started servappli02
 
 
 The 2 Tomcat servers as identical, and the same webapps are deployed on each 
 server in order to be able to access webapps on the other server if one is 
 down.
 By default, requests from clients are processed by the first server because 
 it's the master.
 My problem is that when I crash the Tomcat on the first server, requests from 
 clients are not redirected to the second server. For a while, requests are 
 not processed, then Heartbeat restarts Tomcat itself and requests are 
 processed again by the first server.
 Requests are never forwarded to the second Tomcat if the first is down.

Default behavior on monitoring errors is a local restart. If you always
test from the same IP I would expect your requests to fail while Tomcat
is not running on the one node you are redirected ... so if you choose
the clusterip_hash sourceip-sourceport your chance should be 50/50 to
get redirected ... if you want a real loadbalancer you might want to
integrate a service likde ldirectord with realserver checks to remove a
non-working service from the loadbalancing.

... use ip addr show or define a label to see your VIP ...

Regards,
Andreas

-- 
Need help with Pacemaker?
http://www.hastexo.com/now

 
 Here is my configuration :
 
 ha.cf file (the same on each server) :
 
 crm respawn
 logfacility local0
 logfile /var/log/ha-log
 debugfile   /var/log/ha-debug
 warntime10
 deadtime20
 initdead120
 keepalive   2
 autojoinnone
 nodeservappli01
 nodeservappli02
 ucast   eth0 186.20.100.81 # ignored by node1 (owner of ip)
 ucast   eth0 186.20.100.82 # ignored by node2 (owner of ip)
 
 cib.xml file (the same on each server) :
 
 ?xml version=1.0 ?
 cib admin_epoch=0 crm_feature_set=3.0.1 
 dc-uuid=bf8e9a46-8691-4838-82d9-942a13aeedca epoch=127 have-quorum=1 
 num_updates=51 validate-with=pacemaker-1.0
   configuration
 crm_config
   cluster_property_set id=cib-bootstrap-options
 nvpair id=cib-bootstrap-options-dc-version name=dc-version 
 value=1.0.11-1554a83db0d3c3e546cfd3aaff6af1184f79ee87/
 nvpair id=cib-bootstrap-options-cluster-infrastructure 
 name=cluster-infrastructure value=Heartbeat/
 nvpair id=cib-bootstrap-options-expected-quorum-votes 
 name=expected-quorum-votes value=2/
 nvpair id=cib-bootstrap-options-no-quorum-policy 
 name=no-quorum-policy value=ignore/
 nvpair id=cib-bootstrap-options-stonith-enabled 
 name=stonith-enabled value=false/
   /cluster_property_set
 /crm_config
 nodes
   node id=489a0305-862a-4280-bce5-6defa329df3f type=normal 
 uname=servappli01/
   node id=bf8e9a46-8691-4838-82d9-942a13aeedca type=normal 
 uname=servappli02/
 /nodes
 resources
   clone id=TomcatClone
 meta_attributes id=TomcatClone-meta_attributes
   nvpair id=TomcatClone-meta_attributes-globally-unique 
 name=globally-unique value=true/
 /meta_attributes
 primitive class=ocf id=Tomcat provider=heartbeat type=tomcat
   instance_attributes id=Tomcat-instance_attributes
 nvpair id=Tomcat-instance_attributes-tomcat_name 
 name=tomcat_name value=TomcatSBNG/
 nvpair 

Re: [Linux-HA] heartbeat and squid

2011-09-14 Thread Dejan Muhamedagic
Hi,

On Thu, Sep 01, 2011 at 06:30:46PM +0200, Nicolas Repentin wrote:
 Hi all,
 
 I've got a question for heartbeat. 
 How can I made this :
 
 If squid stop or be killed on node1, how make node2 be master ?
 
 Actually, node2 become master only when node1 is down, or heartbeat
 service on node1 is down, but if I kill squid, nothing happen.
 
 I'm using Centos 6 and last heartbeat version.

Using just heartbeat and no pacemaker? Only pacemaker has service
monitoring.

Thanks,

Dejan

 Thanks a lot for your responses !
 
 
 -- 
 Nicolas
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Restart is not same as Stop and Start

2011-08-04 Thread Rahul Kanna
Mike,

I checked the permission and those are fine.

If you can please check the restart script I have given below, it does not
touch the heartbeat lock file

*touch $LOCKDIR/$SUBSYS*

when the heartbeat is restared and I guess it is a problem. Is it not?

Btw, we have a product for some web application and as part of it we allow
Administrators to configure servers as redundant server and under lying we
use linux-ha to set up redundant servers.

Rahul
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat Restart is not same as Stop and Start

2011-08-03 Thread mike
Permission problem perhaps? Not really sure what you're doing but the 
fact that you have users configuring the cluster (why do you do this 
btw?) may be pointing to a permission issue.

-mgb
On 11-08-03 06:57 PM, Rahul Kanna wrote:
 Hi,

 Our system setup:

 Heartbeat 3.0.3
 DRBD (to manage file system and it is one of the resource managed by CRM)
 Redhat Linux
 Pacemaker

 We have built an application on top of Linux-HA for users to configure
 cluster by giving IP addresses of the nodes, do operations like Restart
 system, Change host names, Resolve split-brain scenario etc.
 In our application, we ran into problem when we do heartbeat restart for
 some operation and then when user does Restart System which internally
 runs the command shutdown -r now. I believe this due to heartbeat lsb
 script and I have explained the scenario below.

 Problem:

 In the heartbeat lsb script, restart does not remove and touches the
 heartbeat lock file.

 On, heartbeat start, the lsb script starts heartbeat and touches
 /var/lock/subsys/heartbeat lock file.

 On, heartbeat stop, the lsb script stops heartbeat and removes the lock
 file at /var/lock/subsys/heartbeat.

 On, heartbeat restart, the lsb script stops heartbeat and starts
 heartbeat. But DOES NOT remove or touches the lock file.

 We call heartbeat restart instead of heartbeat start through our script
 because we are not sure whether heartbeat is already running or not. So when
 heartbeat restart is called when heartbeat is NOT running, heartbeat lsb
 script tries to stop but its not running so it just starts heartbeat BUT
 after starting, heartbeat lock file is not touched (because of restart in
 heartbeat lsb). So now, in the system heartbeat is running (can verify this
 by looking for heartbeat process or heartbeat status command) but there is
 no /var/lock/subsys/heartbeat lock file. This lock file is used by the Linux
 kernal to know what all process it has to stop when it shuts down (shutdown
 -r now). When we run shutdown -r now, Linux kernal thinks heartbeat is not
 running (because there is no lock file) and does not stop heartbeat
 properly. When it comes back up, heartbeat is started but heartbeat state is
 not correct (because it was not stopped properly).
 Due to this, this node is identifies as Primary though the erstwhile
 Secondary node has become Primary now and this causes split-brain.

 So I believe, heartbeat restart should do exactly as heartbeat stop and
 heartbeat start which is not the case now.
 Can you please let me know if my understanding is correct and it is a bug in
 Heartbeat lsb script? Thanks for looking into it.

 I have given below the relevant code from heartbeat lsb script as well

 File: /etc/init.d/heartbeat

start)
  RunStartStop pre-start
  StartHA
  RC=$?
  echo
  if
[ $RC -eq 0 ]
  then
[ ! -d $LOCKDIR ]  mkdir -p $LOCKDIR
touch $LOCKDIR/$SUBSYS
  fi
  RunStartStop post-start $RC
  ;;

stop)
  RunStartStop pre-stop
  StopHA
  RC=$?
  echo
  if
[ $RC -eq 0 ]
  then
rm -f $LOCKDIR/$SUBSYS
  fi
  RunStartStop post-stop $RC
  ;;

restart)
  sleeptime=`ha_parameter deadtime`
  StopHA
  echo
  echo -n Waiting to allow resource takeover to complete:
  sleep $sleeptime
  sleep 10 # allow resource takeover to complete (hopefully).
  echo_success
  echo
  StartHA
  echo
  ;;
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat 3.0.3 stable version + RHEL 6.1: restart network will make heartbeat not send broadcasts

2011-07-18 Thread Ai Lei
Hi:

I'm using Heartbeat 3.0.3 stable version on RHEL 6.1 x64 platform, and found
following issue:
If I restart network service, heartbeat will not send broadcast packages
from port 694. That makes this node never have a chance to join HA cluster
again except restart it.

Details for setting cluster:

1. Compile heartbeat 3.0.3 from source and install it on 2 RHEL 6.1 x64
nodes: installer001 and rhel61
2. Compile pacemaker 1.0.9 from source and install it on 2 RHEL 6.1 x64
nodes
3. Configure /etc/ha.d/ha.cf, make sure  both of these 2 nodes are Online
through crm status
4. run tcpdump -i eth0 port 694, we can found both of these 2 nodes are
sending heartbeat broadcast packages.

Details of configuration file:
=
[root@rhel61 ~]# cat /etc/ha.d/ha.cf
autojoin none
bcast eth0
warntime 5
deadtime 15
initdead 60
keepalive 2
node installer001
node rhel61
crm respawn


Then I tried to restart network service on the backup node installer001,
or just run ifdown eth0; ifup eth0. And on node rhel61 it will detected
installer001 as offline immediately. On node installer001, it will
detect rhel61 as offline.
Then I run tcpdump -i eth0 port 694 on installer001 again, we can only
detect rhel61 still sending broadcast packages but no broadcast packages
coming from installer001, although eth0 network is fully recovered now.

I've tried the exactly same case on RHEL 5.6 (heartbeat 3.0.3), it works
well. After restart network, the node can still send out broadcast
packages...

Thanks for you comments.
--Lei
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat three node configuration

2011-06-26 Thread Andrew Beekhof
On Thu, Jun 9, 2011 at 11:54 PM, Ricardo F ri...@hotmail.com wrote:

 What is the configuration for create a three node cluster?,

Essentially you need Pacemaker on top.
haresources based clusters were only designed for 2-nodes.

 i have this but the servers bring-up the shared ip at same time:
 ha.cflogfacility local0keepalive 2deadtime 10warntime 5initdead 
 30auto_failback offucast bond0 host1 host2 host3node host1node host2node host3
 haresourceshost1 192.168.1.10/24/bond0
 i use  heartbeat 3.0.3 in a debian squeeze in all of the nodes, all of them 
 have in the /etc/hosts the others ips and i can propagate the conf with 
 ha_propagate.

 Thanks
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat step down after split brain scenario

2011-06-20 Thread Jack Berg


Hi - thanks for the response.


Dimitri Maziuk wrote:
 
 
 
 What do you mean by disconnecting: what's your failure scenario and
 how do you expect it to be handled?
 

The disconnection is the loss of the intersite link which interrupts
heartbeat comms.

In this case it's expected that both sites will acquire the resources and
become active.

However, what I want to happen is that one of the sites will give up the
resources again when it sees that the other site is up again.


Dimitri Maziuk wrote:
 
 
 Running daemons are not guaranteed (arguably, expected) to notice when
 the network cable is unplugged. You have to monitor the link and restart
 all processes that bind()/listen() on the interface.
 
 If your nodes are at different sites, you need to also deal with the
 loss of link at the switch, gateway, etc., and figure out which one is
 still connected to the Internet -- and gets to keep the VIP. Which in
 general can't be done from the nodes themselves.
 

Yes - in this case neither site has to be connected to the internet, this is
more an internal load balancing act between two connected sites in a
customers network.

What I found is that by setting auto_failback on in ha.cf at both sites
the site/node listed in haresources will keep the resources when the link is
re-established and the other site will release the resources. 

This is the result I was looking for.

Regards
Jack
-- 
View this message in context: 
http://old.nabble.com/heartbeat-step-down-after-split-brain-scenario-tp31858728p31884521.html
Sent from the Linux-HA mailing list archive at Nabble.com.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat step down after split brain scenario

2011-06-16 Thread Dimitri Maziuk
On 06/16/2011 04:28 AM, Jack Berg wrote:
 
 I have a two node cluster using heartbeat and haproxy. Unfortunately it is
 impossible to provide redundant heartbeat paths between the two nodes at
 different sites so it is possible for a failure to cause split brain.
 
 To evaluate the impact I tried disconnecting the two nodes and I found that
 both become active and both try to keep the VIPs after the link is restored.

What do you mean by disconnecting: what's your failure scenario and
how do you expect it to be handled?

Running daemons are not guaranteed (arguably, expected) to notice when
the network cable is unplugged. You have to monitor the link and restart
all processes that bind()/listen() on the interface.

If your nodes are at different sites, you need to also deal with the
loss of link at the switch, gateway, etc., and figure out which one is
still connected to the Internet -- and gets to keep the VIP. Which in
general can't be done from the nodes themselves.

Dima
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Dejan Muhamedagic
Hi,

On Mon, May 23, 2011 at 03:18:37PM -0700, Nulgor Wankevitch wrote:
 hi,
 
 heartbeat seems to be send udp on port 694 to the whole network segment, 

Do you use ucast or bcast? With the latter, which is broadcast
it's of course expected. If it happens with the former, then you
must have gremlins in your network.

Thanks,

Dejan

 not just the link host, and
 getting blocked by firewall, how to limit?
 
 Firewall: *UDP_IN Blocked* IN=eth0 OUT= 
 MAC=ff:ff:ff:ff:ff:ff:00:22:19:21:f1:75:08:00 SRC=192.168.1.190 
 DST=192.168.1.255 LEN=246 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP 
 SPT=42414 DPT=694 LEN=226
 
 any help thnk you,
 nulgor
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Nulgor Wankevitch
Hi,

thnk for reply, when use ucast things do not seem to work, the nodes are 
able
to bring up the VIP but not any services. When using bcast things seem 
to work correctly
but there is that broadcast problem, I would like to firewall the 
broadcast and isolate
it to the local machine and 2nd node however I do not want to cause 
additional problems,
please advise, thks.

nulgor

On 5/24/2011 1:52 AM, Dejan Muhamedagic wrote:
 Hi,

 On Mon, May 23, 2011 at 03:18:37PM -0700, Nulgor Wankevitch wrote:
 hi,

 heartbeat seems to be send udp on port 694 to the whole network segment,
 Do you use ucast or bcast? With the latter, which is broadcast
 it's of course expected. If it happens with the former, then you
 must have gremlins in your network.

 Thanks,

 Dejan

 not just the link host, and
 getting blocked by firewall, how to limit?

 Firewall: *UDP_IN Blocked* IN=eth0 OUT=
 MAC=ff:ff:ff:ff:ff:ff:00:22:19:21:f1:75:08:00 SRC=192.168.1.190
 DST=192.168.1.255 LEN=246 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
 SPT=42414 DPT=694 LEN=226

 any help thnk you,
 nulgor
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Dejan Muhamedagic
Hi,

On Tue, May 24, 2011 at 02:12:12AM -0700, Nulgor Wankevitch wrote:
 Hi,
 
 thnk for reply, when use ucast things do not seem to work, the nodes are 
 able
 to bring up the VIP but not any services. When using bcast things seem 
 to work correctly

Wow! You really do have gremlins somewhere. ucast cannot not work
in the way you described. Either the nodes can communicate or
they can't. Did you set the right IP address of the peer? Or
there must be some kind of network setup issue.

Thanks,

Dejan

 but there is that broadcast problem, I would like to firewall the 
 broadcast and isolate
 it to the local machine and 2nd node however I do not want to cause 
 additional problems,
 please advise, thks.
 
 nulgor
 
 On 5/24/2011 1:52 AM, Dejan Muhamedagic wrote:
  Hi,
 
  On Mon, May 23, 2011 at 03:18:37PM -0700, Nulgor Wankevitch wrote:
  hi,
 
  heartbeat seems to be send udp on port 694 to the whole network segment,
  Do you use ucast or bcast? With the latter, which is broadcast
  it's of course expected. If it happens with the former, then you
  must have gremlins in your network.
 
  Thanks,
 
  Dejan
 
  not just the link host, and
  getting blocked by firewall, how to limit?
 
  Firewall: *UDP_IN Blocked* IN=eth0 OUT=
  MAC=ff:ff:ff:ff:ff:ff:00:22:19:21:f1:75:08:00 SRC=192.168.1.190
  DST=192.168.1.255 LEN=246 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
  SPT=42414 DPT=694 LEN=226
 
  any help thnk you,
  nulgor
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
  ___
  Linux-HA mailing list
  Linux-HA@lists.linux-ha.org
  http://lists.linux-ha.org/mailman/listinfo/linux-ha
  See also: http://linux-ha.org/ReportingProblems
 
 
 
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Nulgor Wankevitch
ya, gremlins, very reassuring, thanks.

On 5/24/2011 2:42 AM, Dejan Muhamedagic wrote:
 Hi,

 On Tue, May 24, 2011 at 02:12:12AM -0700, Nulgor Wankevitch wrote:
 Hi,

 thnk for reply, when use ucast things do not seem to work, the nodes are
 able
 to bring up the VIP but not any services. When using bcast things seem
 to work correctly
 Wow! You really do have gremlins somewhere. ucast cannot not work
 in the way you described. Either the nodes can communicate or
 they can't. Did you set the right IP address of the peer? Or
 there must be some kind of network setup issue.

 Thanks,

 Dejan

 but there is that broadcast problem, I would like to firewall the
 broadcast and isolate
 it to the local machine and 2nd node however I do not want to cause
 additional problems,
 please advise, thks.

 nulgor

 On 5/24/2011 1:52 AM, Dejan Muhamedagic wrote:
 Hi,

 On Mon, May 23, 2011 at 03:18:37PM -0700, Nulgor Wankevitch wrote:
 hi,

 heartbeat seems to be send udp on port 694 to the whole network segment,
 Do you use ucast or bcast? With the latter, which is broadcast
 it's of course expected. If it happens with the former, then you
 must have gremlins in your network.

 Thanks,

 Dejan

 not just the link host, and
 getting blocked by firewall, how to limit?

 Firewall: *UDP_IN Blocked* IN=eth0 OUT=
 MAC=ff:ff:ff:ff:ff:ff:00:22:19:21:f1:75:08:00 SRC=192.168.1.190
 DST=192.168.1.255 LEN=246 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP
 SPT=42414 DPT=694 LEN=226

 any help thnk you,
 nulgor
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems



___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Dimitri Maziuk
On 05/24/2011 05:48 AM, Nulgor Wankevitch wrote:
 ya, gremlins, very reassuring, thanks.

If the broadcast packets from host A are seen by host B, and unicast
packets from host A to host B are not seen by host B, then your universe
is governed by laws of physics we here are completely unfamiliar with.
Sometimes we call them gremlins.

HTH
Dima
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Nulgor Wankevitch
I think you guys might have jumped the gun on me, why would you
assume it is not seen? I reported it will bring up the VIP but not
the services.

nulgor

On 5/24/2011 9:37 AM, Dimitri Maziuk wrote:
 On 05/24/2011 05:48 AM, Nulgor Wankevitch wrote:
 ya, gremlins, very reassuring, thanks.
 If the broadcast packets from host A are seen by host B, and unicast
 packets from host A to host B are not seen by host B, then your universe
 is governed by laws of physics we here are completely unfamiliar with.
 Sometimes we call them gremlins.

 HTH
 Dima


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Dimitri Maziuk
On 05/24/2011 02:56 PM, Nulgor Wankevitch wrote:
 I think you guys might have jumped the gun on me, why would you
 assume it is not seen? I reported it will bring up the VIP but not
 the services.

The only way I can vaguely imagine that possibly happening is if cib
isn't propagated to the other node(s) due to, indeed, a problem with
comms channel. However, I can think of only one way to make that happen
over unicast but not broadcast: unicasting to a wrong host.

Dima
-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Nulgor Wankevitch
it seems like cib is on both nodes as I am able to view both from crm_mon
and crm configure show shows the same info, am I correct?

On 5/24/2011 2:02 PM, Dimitri Maziuk wrote:
 On 05/24/2011 02:56 PM, Nulgor Wankevitch wrote:
 I think you guys might have jumped the gun on me, why would you
 assume it is not seen? I reported it will bring up the VIP but not
 the services.
 The only way I can vaguely imagine that possibly happening is if cib
 isn't propagated to the other node(s) due to, indeed, a problem with
 comms channel. However, I can think of only one way to make that happen
 over unicast but not broadcast: unicasting to a wrong host.

 Dima


 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat sends udp to whole network

2011-05-24 Thread Lars Ellenberg
On Tue, May 24, 2011 at 02:10:25PM -0700, Nulgor Wankevitch wrote:
 it seems like cib is on both nodes as I am able to view both from crm_mon
 and crm configure show shows the same info, am I correct?

This does not lead anywhere.

You complained that broadcast broadcasts.
Well, that's the nature of it.

Then use unicast.
But unicast does not work for me.

Some talk about gremlins...
Let's skip that.

So. Why does unicast seem to not work for you.

Maybe provide logs? E.g. a hb_report from starting up nodes configured
with unicast to them bringing up some, but not all, stuff?

And then we go from there.

BTW, you can directly ask heartbeat
what it thinks about it's comm channels:
for node in $(cl_status listnodes); do
for link in $(cl_status listhblinks $node); do
linkstatus=$(cl_status hblinkstatus $node $link)
printf %s\t%s\t%s\n $node $link $linkstatus
done
done

We should add a pretty-print-all-known-link-states to cl_status...

-- 
: Lars Ellenberg
: LINBIT | Your Way to High Availability
: DRBD/HA support and consulting http://www.linbit.com

DRBD® and LINBIT® are registered trademarks of LINBIT, Austria.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] Heartbeat kills itself

2011-05-05 Thread Andrea Bertucci
On 05/05/2011 11:45 AM, Lacoco, Joshua wrote:
 Hello,

 I have a 2 node cluster on RHEL 5.4.  I am currently only running the 
 heartbeat service on one node because the heartbeat service kills itself and 
 I'm trying to avoid downtime/split brain issues.  I've tried searching and I 
 found posts that have similar problems.  I am running heartbeat 3.0.2-1.  
 Below are the same messages I am getting (from a different post).  Does 
 anyone know if this is a known issue or can point me in the right direction?  
 I'm stumped.
Hello.
I had a similar problem on RHEL 5.4 with heartbeat 2.3 and heartbeat 3 
(i don't remeber exact sw version)..the only think that fix this problem 
was to
download a recent kernel version and substitute the original one.
Hope this help.
Andrea.

-- 

Andrea Bertucci
Aitek S.p.A. - Via della Crocetta, 15 - I-16122 Genova
tel.: +39 010 846731
fax: +39 010 8467350 - e-mail: abertu...@aitek.it

---
Questo  messaggio di posta elettronica e gli allegati sono riservati e 
destinati esclusivamente alle persone cui sono indirizzati. Se avete ricevuto 
questo messaggio per errore, Vi preghiamo di rispedirlo al mittente o 
cancellarlo dal Vostro sistema. La pubblicazione, l'uso, la diffusione, 
l'inoltro, la stampa o la copia non autorizzati di questo messaggio e degli 
allegati sono vietati.

This e-mail and any files transmitted with it are confidential and intended 
solely for the use of the individual to whom it is addressed. If you have 
received this email in error please send it back. Unauthorized publication, 
use, disclosure, forwarding, printing or copying of this email and its 
associated attachments is strictly prohibited.

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

2011-04-26 Thread mike
On 11-04-22 06:25 AM, SEILLIER Mathieu wrote:
 Hi all,
 First I'm french so sorry in advance for my English...

 I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers 
 under Linux RedHat 5.3. The first server is active, the other one is passive. 
 The master is called servappli01, with IP address 186.20.100.40, the slave is 
 called servappli02, with IP address 186.20.100.39.
 I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when 
 server is started, this is Heartbeat which starts Tomcat when it's running.
 My problem is : When heartbeat is started on the first server, then on the 
 second server, the VIP is assigned to the 2 servers ! also, Tomcat is started 
 on each server, and each node see the other node as dead !

 Here is my configuration :

 ha.cf file (the same on each server) :

 logfile /var/log/ha-log

 debugfile /var/log/ha-debug

 logfacility none

 keepalive 2

 warntime 6

 deadtime 10

 initdead 90

 bcast eth0

 node servappli01 servappli02

 auto_failback yes

 respawn hacluster /usr/lib/heartbeat/ipfail

 apiauth ipfail gid=haclient uid=hacluster


 haresources file (the same on each server) :

 servappli01 IPaddr::186.20.100.41/24/eth0 tomcat


 Result of ifconfig command on the first server (servappli01) :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.40  Bcast:186.20.100.255  Masque:255.255.255.0

adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0

TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:385833 (3.5 GiB)  TX bytes:2694953468 (2.5 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.41  Bcast:186.20.100.255  Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of ifconfig command on the second server (servappli02) at the same 
 time :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.39  Bcast:186.20.100.255  Masque:255.255.255.0

adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0

TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:2620027933 (2.4 GiB)  TX bytes:3595896739 (3.3 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.41  Bcast:186.20.100.255  Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of /usr/bin/cl_status listnodes command (on each server) :

 servappli02

 servappli01


 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 :

 active

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 :

 active

 And of course, if I kill Tomcat on master server, there's no switch to the 
 second server (a call to a webapp using the VIP doesn't work).

 Can somebody help me please ?
 I guess there's is something wrong but I don't know what !
 Thanx

 Mathieu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


It almost sounds like the nodes are unaware of each other. Could be a 
network thing maybe. Here's some things to try:
Can you ssh or ping one node from the other?
Bring up one node with the VIP running - leave the other node up but 
heartbeat down. an you ping the VIP from the node NOT running HA?
What happens when you look at the cluster when both nodes are running - 
use the crm_mon command and paste what you see in here.

I'm thinking you have some sort of network issue.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

2011-04-26 Thread Amit Jathar
Have you generated the authkey by corosync-keygen command on one node  then 
copied that file to other node ?

-Original Message-
From: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] On Behalf Of mike
Sent: Tuesday, April 26, 2011 5:41 PM
To: General Linux-HA mailing list
Subject: Re: [Linux-HA] [Heartbeat] my VIP doesn't work :(

On 11-04-22 06:25 AM, SEILLIER Mathieu wrote:
 Hi all,
 First I'm french so sorry in advance for my English...

 I have to use Heartbeat for High Availability between 2 Tomcat 5.5 servers 
 under Linux RedHat 5.3. The first server is active, the other one is passive. 
 The master is called servappli01, with IP address 186.20.100.40, the slave is 
 called servappli02, with IP address 186.20.100.39.
 I configured a virtual IP 186.20.100.41. Each Tomcat is not launched when 
 server is started, this is Heartbeat which starts Tomcat when it's running.
 My problem is : When heartbeat is started on the first server, then on the 
 second server, the VIP is assigned to the 2 servers ! also, Tomcat is started 
 on each server, and each node see the other node as dead !

 Here is my configuration :

 ha.cf file (the same on each server) :

 logfile /var/log/ha-log

 debugfile /var/log/ha-debug

 logfacility none

 keepalive 2

 warntime 6

 deadtime 10

 initdead 90

 bcast eth0

 node servappli01 servappli02

 auto_failback yes

 respawn hacluster /usr/lib/heartbeat/ipfail

 apiauth ipfail gid=haclient uid=hacluster


 haresources file (the same on each server) :

 servappli01 IPaddr::186.20.100.41/24/eth0 tomcat


 Result of ifconfig command on the first server (servappli01) :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.40  Bcast:186.20.100.255
 Masque:255.255.255.0

adr inet6: fe80::21e:bff:febb:c238/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:14404996 errors:0 dropped:0 overruns:0 frame:0

TX packets:6580505 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:385833 (3.5 GiB)  TX bytes:2694953468 (2.5
 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:BB:C2:38

inet adr:186.20.100.41  Bcast:186.20.100.255
 Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of ifconfig command on the second server (servappli02) at the same 
 time :

 eth0  Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.39  Bcast:186.20.100.255
 Masque:255.255.255.0

adr inet6: fe80::21e:bff:fe77:c90c/64 Scope:Lien

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

RX packets:23815049 errors:0 dropped:0 overruns:0 frame:0

TX packets:17441845 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 lg file transmission:1000

RX bytes:2620027933 (2.4 GiB)  TX bytes:3595896739 (3.3
 GiB)

Interruption:177 Memoire:fa00-fa012100



 eth0:0Link encap:Ethernet  HWaddr 00:1E:0B:77:C9:0C

inet adr:186.20.100.41  Bcast:186.20.100.255
 Masque:255.255.255.0

UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
Interruption:177 Memoire:fa00-fa012100

 Result of /usr/bin/cl_status listnodes command (on each server) :

 servappli02

 servappli01


 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli01 :

 active

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli01 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli01 command on servappli02 :

 dead

 Result of /usr/bin/cl_status nodestatus servappli02 command on servappli02 :

 active

 And of course, if I kill Tomcat on master server, there's no switch to the 
 second server (a call to a webapp using the VIP doesn't work).

 Can somebody help me please ?
 I guess there's is something wrong but I don't know what !
 Thanx

 Mathieu
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems


It almost sounds like the nodes are unaware of each other. Could be a network 
thing maybe. Here's some things to try:
Can you ssh or ping one node from the other?
Bring up one node with the VIP running - leave the other node up but heartbeat 
down. an you ping the VIP from the node NOT running HA?
What happens when you look at the cluster when both nodes are running - use the 
crm_mon command and paste what you see in here.

I'm thinking you have some sort of network issue.
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http

  1   2   3   4   5   6   7   8   >