Re: [Linux-HA] heartbeat failover

2014-01-23 Thread Bjoern.Becker
Hello,

thanks a lot! I didn't know about heartbeat is almost deprecated.
I'll try corosync and pacemaker, but I read that corosync need to run over 
multicast.
Unfortunately, I can't use multicast in my network. Do you know any other 
possibility, I can't find anything that corosync can run without multicast?


Best regards
Björn 

-Ursprüngliche Nachricht-

Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
Gesendet: Mittwoch, 22. Januar 2014 20:36
An: General Linux-HA mailing list
Betreff: Re: [Linux-HA] heartbeat failover

On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
 Hello,

 I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to 
 long to failover and I try to tune this.

 When node 1 is active and I shutdown node 2, then node 1 try to activate the 
 cluster.
 The problem is, node 1 already got the primary role and when re-activating it 
 take time again and during this the nfs share isn't available.

 Is it possible to disable this? Node 1 don't have to do anything if it's 
 already in primary role and the second node is not available.

 Mit freundlichen Grüßen / Best regards Björn

If this is a new project, I strongly recommend switching out heartbeat for 
corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long 
time and there are no plans to restart development in the future. Everything 
(even RH) is standardizing on the corosync+pacemaker stack, so it has the most 
vibrant community as well.

-- 
Digimer
Papers and Projects: https://alteeve.ca/w/
What if the cure for cancer is trapped in the mind of a person without 
access to education?
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] heartbeat failover

2014-01-23 Thread Lukas Grossar
Hi Björn

Here ist an example how you can setup corosync to use Unicast UDP:
https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu

The important parts are transport: udpu and that you need to
configure every member manually using memberaddr: 10.16.35.115.

Best regards
Lukas


On Thu, 23 Jan 2014 13:36:22 +
bjoern.bec...@easycash.de wrote:

 Hello,
 
 thanks a lot! I didn't know about heartbeat is almost deprecated.
 I'll try corosync and pacemaker, but I read that corosync need to run
 over multicast. Unfortunately, I can't use multicast in my network.
 Do you know any other possibility, I can't find anything that
 corosync can run without multicast?
 
 
 Best regards
 Björn 
 
 -Ursprüngliche Nachricht-
 
 Von: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
 Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA
 mailing list Betreff: Re: [Linux-HA] heartbeat failover
 
 On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
  Hello,
 
  I got a drbd+nfs+heartbeat setup and in general it's working. But
  it takes to long to failover and I try to tune this.
 
  When node 1 is active and I shutdown node 2, then node 1 try to
  activate the cluster. The problem is, node 1 already got the
  primary role and when re-activating it take time again and during
  this the nfs share isn't available.
 
  Is it possible to disable this? Node 1 don't have to do anything if
  it's already in primary role and the second node is not available.
 
  Mit freundlichen Grüßen / Best regards Björn
 
 If this is a new project, I strongly recommend switching out
 heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't
 been developed in a long time and there are no plans to restart
 development in the future. Everything (even RH) is standardizing on
 the corosync+pacemaker stack, so it has the most vibrant community as
 well.
 



-- 
Adfinis SyGroup AG
Lukas Grossar, System Engineer

Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06


signature.asc
Description: PGP signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat failover

2014-01-23 Thread Bjoern.Becker
Hi Lukas,

thank you. Well, I've to wait for some firewall changes for 5405 UDP. 

But I'm not sure if it's correct what I'm doing.

Node1:
interface {
member {
memberaddr: 10.128.61.60 # node 1 
}
member {
memberaddr: 10.128.62.60 # node 2
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.61.0
mcastport: 5405
}
transport: udpu

Node2: 
interface {
member {
memberaddr: 10.128.61.60
}
member {
memberaddr: 10.128.62.60
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.62.0
mcastport: 5405
}
transport: udpu

Something seems to be wrong defenitly. My firewall was on very high load...


Mit freundlichen Grüßen / Best regards
Björn 


-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar
Gesendet: Donnerstag, 23. Januar 2014 16:54
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

Hi Björn

Here ist an example how you can setup corosync to use Unicast UDP:
https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu

The important parts are transport: udpu and that you need to configure every 
member manually using memberaddr: 10.16.35.115.

Best regards
Lukas


On Thu, 23 Jan 2014 13:36:22 +
bjoern.bec...@easycash.de wrote:

 Hello,
 
 thanks a lot! I didn't know about heartbeat is almost deprecated.
 I'll try corosync and pacemaker, but I read that corosync need to run 
 over multicast. Unfortunately, I can't use multicast in my network.
 Do you know any other possibility, I can't find anything that corosync 
 can run without multicast?
 
 
 Best regards
 Björn
 
 -Ursprüngliche Nachricht-
 
 Von: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
 Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing 
 list Betreff: Re: [Linux-HA] heartbeat failover
 
 On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
  Hello,
 
  I got a drbd+nfs+heartbeat setup and in general it's working. But it 
  takes to long to failover and I try to tune this.
 
  When node 1 is active and I shutdown node 2, then node 1 try to 
  activate the cluster. The problem is, node 1 already got the primary 
  role and when re-activating it take time again and during this the 
  nfs share isn't available.
 
  Is it possible to disable this? Node 1 don't have to do anything if 
  it's already in primary role and the second node is not available.
 
  Mit freundlichen Grüßen / Best regards Björn
 
 If this is a new project, I strongly recommend switching out heartbeat 
 for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed 
 in a long time and there are no plans to restart development in the 
 future. Everything (even RH) is standardizing on the 
 corosync+pacemaker stack, so it has the most vibrant community as 
 well.
 



--
Adfinis SyGroup AG
Lukas Grossar, System Engineer

Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat failover

2014-01-23 Thread Bjoern.Becker
Uhhh..I got the same configuration as the example config you sent me now. 
But I cause high cpu load on our cisco asa firewall..

I guess this traffic is not normal?

root@node01:/etc/corosync# tcpdump dst port 5405
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
17:41:06.093140 IP node01.5405  node02.5405: UDP, length 70
17:41:06.097327 IP node02.5405  node01.5405: UDP, length 70
17:41:06.113418 IP node01.52580  node02.5405: UDP, length 82
17:41:06.286517 IP node01.5405  node02.5405: UDP, length 70
17:41:06.291095 IP node02.5405  node01.5405: UDP, length 70
17:41:06.480221 IP node01.5405  node02.5405: UDP, length 70
17:41:06.484520 IP node02.5405  node01.5405: UDP, length 70
17:41:06.500608 IP node01.52580  node02.5405: UDP, length 82
17:41:06.673721 IP node01.5405  node02.5405: UDP, length 70
17:41:06.678654 IP node02.5405  node01.5405: UDP, length 70
17:41:06.867757 IP node01.5405  node02.5405: UDP, length 70
17:41:06.872492 IP node02.5405  node01.5405: UDP, length 70
17:41:06.888576 IP node01.52580  node02.5405: UDP, length 82
17:41:07.061664 IP node01.5405  node02.5405: UDP, length 70
17:41:07.066304 IP node02.5405  node01.5405: UDP, length 70
17:41:07.255409 IP node01.5405  node02.5405: UDP, length 70
17:41:07.260512 IP node02.5405  node01.5405: UDP, length 70
17:41:07.275601 IP node01.52580  node02.5405: UDP, length 82

Mit freundlichen Grüßen / Best regards
Björn 


-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Becker, Björn
Gesendet: Donnerstag, 23. Januar 2014 17:28
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

Hi Lukas,

thank you. Well, I've to wait for some firewall changes for 5405 UDP. 

But I'm not sure if it's correct what I'm doing.

Node1:
interface {
member {
memberaddr: 10.128.61.60 # node 1 
}
member {
memberaddr: 10.128.62.60 # node 2
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.61.0
mcastport: 5405
}
transport: udpu

Node2: 
interface {
member {
memberaddr: 10.128.61.60
}
member {
memberaddr: 10.128.62.60
}
# The following values need to be set based on your environment 
ringnumber: 0
bindnetaddr: 10.128.62.0
mcastport: 5405
}
transport: udpu

Something seems to be wrong defenitly. My firewall was on very high load...


Mit freundlichen Grüßen / Best regards
Björn 


-Ursprüngliche Nachricht-
Von: linux-ha-boun...@lists.linux-ha.org 
[mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Lukas Grossar
Gesendet: Donnerstag, 23. Januar 2014 16:54
An: linux-ha@lists.linux-ha.org
Betreff: Re: [Linux-HA] heartbeat failover

Hi Björn

Here ist an example how you can setup corosync to use Unicast UDP:
https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu

The important parts are transport: udpu and that you need to configure every 
member manually using memberaddr: 10.16.35.115.

Best regards
Lukas


On Thu, 23 Jan 2014 13:36:22 +
bjoern.bec...@easycash.de wrote:

 Hello,
 
 thanks a lot! I didn't know about heartbeat is almost deprecated.
 I'll try corosync and pacemaker, but I read that corosync need to run 
 over multicast. Unfortunately, I can't use multicast in my network.
 Do you know any other possibility, I can't find anything that corosync 
 can run without multicast?
 
 
 Best regards
 Björn
 
 -Ursprüngliche Nachricht-
 
 Von: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
 Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing 
 list Betreff: Re: [Linux-HA] heartbeat failover
 
 On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
  Hello,
 
  I got a drbd+nfs+heartbeat setup and in general it's working. But it 
  takes to long to failover and I try to tune this.
 
  When node 1 is active and I shutdown node 2, then node 1 try to 
  activate the cluster. The problem is, node 1 already got the primary 
  role and when re-activating it take time again and during this the 
  nfs share isn't available.
 
  Is it possible to disable this? Node 1 don't have to do anything if 
  it's already in primary role and the second node is not available.
 
  Mit freundlichen Grüßen / Best regards Björn
 
 If this is a new project, I strongly recommend switching out heartbeat 
 for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed 
 in a long time and there are no plans to restart 

[Linux-HA] two node cluster with postfix - how to get system mails from both nodes

2014-01-23 Thread Christian Richter
Hello,

 

i'm looking for the right way to integrate postfix in my 2 node cluster.

 

The question: how to receive system mails from the node where postfix is not
started?

 

this guide brought me to my current solution

http://www.linux-ha.org/wiki/Postfix_(resource_agent)

 

i start postfix with 

master_service_disable = inet

alternate_config_directories = /etc/postfix.mail

over the systems init script

and created a second instance which is controlled by pacemaker for pop/smtp
stuff, but this depends on the main postfix instance.

furthermore i'm having problems to start the second instance with the
resource agent. the Postfix mail system is not running

 

before spend nights solving this i ask if this the way it should be? Is
there an easier way to get system mails from both nodes without the need of

two postfix instances?

 

this is my first mail to a mailing list ever, so please be lenient with me
;)

 

thanks

chris

 

 

 

 

___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


Re: [Linux-HA] two node cluster with postfix - how to get system mails from both nodes

2014-01-23 Thread Dimitri Maziuk
On 01/23/2014 11:14 AM, Christian Richter wrote:
 Hello,
 
  
 
 i'm looking for the right way to integrate postfix in my 2 node cluster.

The right way is don't. Read e.g.
http://serverfault.com/questions/303554/how-to-build-a-high-availability-postfix-system

-- 
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu



signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] /usr/sbin/lrmadmin missing from cluster-glue

2014-01-23 Thread Kristoffer Grönlund
On Sat, 28 Dec 2013 11:18:44 -0500
Tom Parker tpar...@cbnco.com wrote:

 Hello
 
 /usr/sbin/lrmadmin is missing from the latest version of cluster-glue
 in SLES SP3.  Has the program been deprecated or is this an issue in
 the packaging of the RPM?
 

Hi,

I know this is a bit late, but I just discovered this email. Yes,
lrmadmin has been deprecated since it is incompatible with recent
versions of pacemaker.

 Thanks
 
 Tom
 ___
 Linux-HA mailing list
 Linux-HA@lists.linux-ha.org
 http://lists.linux-ha.org/mailman/listinfo/linux-ha
 See also: http://linux-ha.org/ReportingProblems
 



-- 
// Kristoffer Grönlund
// kgronl...@suse.com
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems


[Linux-HA] Antw: Re: heartbeat failover

2014-01-23 Thread Ulrich Windl
We have a server with a network traffic light on the front. With
corosync/pacemaker the light is constantly flickering, even if the cluster
does nothing.
So I guess it's normal. If your firewall has a problem with that, I can tell
you that traffic will increase if your configuration grows and if the cluster
actually does something. I don't know if communicate like mad was a design
concept, but on out HP-UX Service Guard cluster there was only the heartbeat
traffic (configured to be one packet every 7 seconds (for good luck reasons
;-)) when the cluster was idle.

I noticed that with cLVM (part of the log like mad family) doing mirroring
the cluster communikation frequently breaks down. When you raed abou the TOTEM
design goals, you might conclude that either the implementation is broken, or
the configuration you use is. Maybe the system is just too complex.

Here are two examples; first the log like mad, then the corosync
communication problems:

---
Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 31
Jan 23 13:51:09 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffd1d4, len=31, csid=0x740294c4, xid=0
Jan 23 13:51:09 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:09 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for
clientid 0x500 XID 1 on node 230314ac
Jan 23 13:51:09 o1 lvm[23717]: Syncing device names
Jan 23 13:51:09 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:09 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 31
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffd524, len=31, csid=0x740294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for
clientid 0x500 XID 6 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: Syncing device names
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 29
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffd874, len=29, csid=0x740294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command LOCK_VG (0x33) for
clientid 0x500 XID 8 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: do_lock_vg: resource 'P_#global', cmd = 0x4
LCK_VG (WRITE|VG), flags = 0x4 ( DMEVENTD_MONITOR ), critical_section = 0
Jan 23 13:51:10 o1 lvm[23717]: Refreshing context
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 31
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffdbc4, len=31, csid=0x740294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for
clientid 0x500 XID 10 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: Syncing device names
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 520295596 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 537072812 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 587404460 for
0. len 31
Jan 23 13:51:10 o1 lvm[23717]: add_to_lvmqueue: cmd=0x7f2f74000900.
client=0x6a2a80, msg=0x7f2f78ffdf14, len=31, csid=0x740294c4, xid=0
Jan 23 13:51:10 o1 lvm[23717]: process_work_item: remote
Jan 23 13:51:10 o1 lvm[23717]: process_remote_command SYNC_NAMES (0x2d) for
clientid 0x500 XID 13 on node 230314ac
Jan 23 13:51:10 o1 lvm[23717]: Syncing device names
Jan 23 13:51:10 o1 lvm[23717]: LVM thread waiting for work
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got message from nodeid 570627244 for
587404460. len 18
Jan 23 13:51:10 o1 lvm[23717]: 520295596 got