Re: [Linux-HA] heartbeat failover
Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart development in the future. Everything (even RH) is standardizing on the corosync+pacemaker stack, so it has the most vibrant community as well. -- Digimer Papers and Projects: https://alteeve.ca/w/ What if the cure for cancer is trapped in the mind of a person without access to education? ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat failover
Hi Björn Here ist an example how you can setup corosync to use Unicast UDP: https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu The important parts are transport: udpu and that you need to configure every member manually using memberaddr: 10.16.35.115. Best regards Lukas On Thu, 23 Jan 2014 13:36:22 + bjoern.bec...@easycash.de wrote: Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart development in the future. Everything (even RH) is standardizing on the corosync+pacemaker stack, so it has the most vibrant community as well. -- Adfinis SyGroup AG Lukas Grossar, System Engineer Keltenstrasse 98 | CH-3018 Bern Tel. 031 550 31 11 | Direkt 031 550 31 06 signature.asc Description: PGP signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat failover
Hi Lukas, thank you. Well, I've to wait for some firewall changes for 5405 UDP. But I'm not sure if it's correct what I'm doing. Node1: interface { member { memberaddr: 10.128.61.60 # node 1 } member { memberaddr: 10.128.62.60 # node 2 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.61.0 mcastport: 5405 } transport: udpu Node2: interface { member { memberaddr: 10.128.61.60 } member { memberaddr: 10.128.62.60 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.62.0 mcastport: 5405 } transport: udpu Something seems to be wrong defenitly. My firewall was on very high load... Mit freundlichen Grüßen / Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar Gesendet: Donnerstag, 23. Januar 2014 16:54 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover Hi Björn Here ist an example how you can setup corosync to use Unicast UDP: https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu The important parts are transport: udpu and that you need to configure every member manually using memberaddr: 10.16.35.115. Best regards Lukas On Thu, 23 Jan 2014 13:36:22 + bjoern.bec...@easycash.de wrote: Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart development in the future. Everything (even RH) is standardizing on the corosync+pacemaker stack, so it has the most vibrant community as well. -- Adfinis SyGroup AG Lukas Grossar, System Engineer Keltenstrasse 98 | CH-3018 Bern Tel. 031 550 31 11 | Direkt 031 550 31 06 ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat failover
Uhhh..I got the same configuration as the example config you sent me now. But I cause high cpu load on our cisco asa firewall.. I guess this traffic is not normal? root@node01:/etc/corosync# tcpdump dst port 5405 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 17:41:06.093140 IP node01.5405 node02.5405: UDP, length 70 17:41:06.097327 IP node02.5405 node01.5405: UDP, length 70 17:41:06.113418 IP node01.52580 node02.5405: UDP, length 82 17:41:06.286517 IP node01.5405 node02.5405: UDP, length 70 17:41:06.291095 IP node02.5405 node01.5405: UDP, length 70 17:41:06.480221 IP node01.5405 node02.5405: UDP, length 70 17:41:06.484520 IP node02.5405 node01.5405: UDP, length 70 17:41:06.500608 IP node01.52580 node02.5405: UDP, length 82 17:41:06.673721 IP node01.5405 node02.5405: UDP, length 70 17:41:06.678654 IP node02.5405 node01.5405: UDP, length 70 17:41:06.867757 IP node01.5405 node02.5405: UDP, length 70 17:41:06.872492 IP node02.5405 node01.5405: UDP, length 70 17:41:06.888576 IP node01.52580 node02.5405: UDP, length 82 17:41:07.061664 IP node01.5405 node02.5405: UDP, length 70 17:41:07.066304 IP node02.5405 node01.5405: UDP, length 70 17:41:07.255409 IP node01.5405 node02.5405: UDP, length 70 17:41:07.260512 IP node02.5405 node01.5405: UDP, length 70 17:41:07.275601 IP node01.52580 node02.5405: UDP, length 82 Mit freundlichen Grüßen / Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Becker, Björn Gesendet: Donnerstag, 23. Januar 2014 17:28 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover Hi Lukas, thank you. Well, I've to wait for some firewall changes for 5405 UDP. But I'm not sure if it's correct what I'm doing. Node1: interface { member { memberaddr: 10.128.61.60 # node 1 } member { memberaddr: 10.128.62.60 # node 2 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.61.0 mcastport: 5405 } transport: udpu Node2: interface { member { memberaddr: 10.128.61.60 } member { memberaddr: 10.128.62.60 } # The following values need to be set based on your environment ringnumber: 0 bindnetaddr: 10.128.62.0 mcastport: 5405 } transport: udpu Something seems to be wrong defenitly. My firewall was on very high load... Mit freundlichen Grüßen / Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar Gesendet: Donnerstag, 23. Januar 2014 16:54 An: linux-ha@lists.linux-ha.org Betreff: Re: [Linux-HA] heartbeat failover Hi Björn Here ist an example how you can setup corosync to use Unicast UDP: https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu The important parts are transport: udpu and that you need to configure every member manually using memberaddr: 10.16.35.115. Best regards Lukas On Thu, 23 Jan 2014 13:36:22 + bjoern.bec...@easycash.de wrote: Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart
[Linux-HA] two node cluster with postfix - how to get system mails from both nodes
Hello, i'm looking for the right way to integrate postfix in my 2 node cluster. The question: how to receive system mails from the node where postfix is not started? this guide brought me to my current solution http://www.linux-ha.org/wiki/Postfix_(resource_agent) i start postfix with master_service_disable = inet alternate_config_directories = /etc/postfix.mail over the systems init script and created a second instance which is controlled by pacemaker for pop/smtp stuff, but this depends on the main postfix instance. furthermore i'm having problems to start the second instance with the resource agent. the Postfix mail system is not running before spend nights solving this i ask if this the way it should be? Is there an easier way to get system mails from both nodes without the need of two postfix instances? this is my first mail to a mailing list ever, so please be lenient with me ;) thanks chris ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] two node cluster with postfix - how to get system mails from both nodes
On 01/23/2014 11:14 AM, Christian Richter wrote: Hello, i'm looking for the right way to integrate postfix in my 2 node cluster. The right way is don't. Read e.g. http://serverfault.com/questions/303554/how-to-build-a-high-availability-postfix-system -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] /usr/sbin/lrmadmin missing from cluster-glue
On Sat, 28 Dec 2013 11:18:44 -0500 Tom Parker tpar...@cbnco.com wrote: Hello /usr/sbin/lrmadmin is missing from the latest version of cluster-glue in SLES SP3. Has the program been deprecated or is this an issue in the packaging of the RPM? Hi, I know this is a bit late, but I just discovered this email. Yes, lrmadmin has been deprecated since it is incompatible with recent versions of pacemaker. Thanks Tom ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems -- // Kristoffer Grönlund // kgronl...@suse.com ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Antw: Re: heartbeat failover
We have a server with a network traffic light on the front. With corosync/pacemaker the light is constantly flickering, even if the cluster does nothing. So I guess it's normal. If your firewall has a problem with that, I can tell you that traffic will increase if your configuration grows and if the cluster actually does something. I don't know if communicate like mad was a design concept, but on out HP-UX Service Guard cluster there was only the heartbeat traffic (configured to be one packet every 7 seconds (for good luck reasons ;-)) when the cluster was idle. I noticed that with cLVM (part of the log like mad family) doing mirroring the cluster communikation frequently breaks down. When you raed abou the TOTEM design goals, you might conclude that either the implementation is broken, or the configuration you use is. Maybe the system is just too complex. Here are two examples; first the log like mad, then the corosync communication problems: --- Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for 0. len 31 Jan 23 13:51:09 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900. client=0x6a2a80, msg=0x7f2f78ffd1d4, len=31, csid=0x740294c4, xid=0 Jan 23 13:51:09 o1 lvm[23717]: process_work_item: remote Jan 23 13:51:09 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x500 XID 1 on node 230314ac Jan 23 13:51:09 o1 lvm[23717]: Syncing device names Jan 23 13:51:09 o1 lvm[23717]: LVM thread waiting for work Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for 587404460. len 18 Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for 587404460. len 18 Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for 0. len 31 Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900. client=0x6a2a80, msg=0x7f2f78ffd524, len=31, csid=0x740294c4, xid=0 Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x500 XID 6 on node 230314ac Jan 23 13:51:10 o1 lvm[23717]: Syncing device names Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for 0. len 29 Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900. client=0x6a2a80, msg=0x7f2f78ffd874, len=29, csid=0x740294c4, xid=0 Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote Jan 23 13:51:10 o1 lvm[23717]: process_remote_command LOCK_VG (0x33) for clientid 0x500 XID 8 on node 230314ac Jan 23 13:51:10 o1 lvm[23717]: do_lock_vg: resource 'P_#global', cmd = 0x4 LCK_VG (WRITE|VG), flags = 0x4 ( DMEVENTD_MONITOR ), critical_section = 0 Jan 23 13:51:10 o1 lvm[23717]: Refreshing context Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for 0. len 31 Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900. client=0x6a2a80, msg=0x7f2f78ffdbc4, len=31, csid=0x740294c4, xid=0 Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x500 XID 10 on node 230314ac Jan 23 13:51:10 o1 lvm[23717]: Syncing device names Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for 0. len 31 Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900. client=0x6a2a80, msg=0x7f2f78ffdf14, len=31, csid=0x740294c4, xid=0 Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for clientid 0x500 XID 13 on node 230314ac Jan 23 13:51:10 o1 lvm[23717]: Syncing device names Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for 587404460. len 18 Jan 23 13:51:10 o1 lvm[23717]: 520295596 got