Re: [Linux-HA] CentOS 7 Pacemake/Corosync

2015-03-09 Thread Lukas Grossar
Hi,

I can't offer a solution, but I remember having run into similar issues
when the output of hostname showed the FQDN.

Maybe that helps you narrow down the problem.

Best regards
Lukas Grossar

On Fri, 06 Mar 2015 09:12:47 +0100
Willi Fehler willi.feh...@t-online.de wrote:

 Hi,
 
 I'm trying to build a Pacemaker/Corosync Cluster on CentOS7. The
 default Corosync configuration with one ring is working but then I
 only have 1 ring and no encryption.
 
 RING ID 1
  id= 10.10.10.1
  status= ring 1 active with no faults
 
 I've tried to activate the following configuration but it doesn't
 work.
 
 [root@linsrv006 corosync]# cat /etc/corosync/corosync.conf
 totem {
  version: 2
  secauth: on
  threads: 0
  rrp_mode: active
  interface {
  ringnumber: 0
  bindnetaddr: 192.168.0.0
  mcastaddr: 226.94.42.7
  mcastport: 5411
  }
  interface {
  ringnumber: 1
  bindnetaddr: 10.10.10.0
  mcastaddr: 226.94.42.11
  mcastport: 5419
  }
  token: 1
  token_retransmits_before_loss_const: 40
  rrp_problem_count_timeout: 2
  nodeid: 3
 }
 quorum {
  provider: corosync_votequorum
  expected_votes: 3
 }
 
 logging {
 to_syslog: yes
 }
 
 
 The problem is also than, that pcs status is showing 3 nodes.
 
 [root@linsrv006 corosync]# pcs status
 Cluster name:
 Last updated: Fri Mar  6 09:10:41 2015
 Last change: Fri Mar  6 09:10:30 2015 via crmd on
 linsrv006.willi-net.local Stack: corosync
 Current DC: NONE
 4 Nodes configured
 9 Resources configured
 
 
 OFFLINE: [ linsrv006 linsrv006.willi-net.local
 linsrv006.willi-net.local linsrv007 ]
 
 Full list of resources:
 
   Master/Slave Set: ms_drbd_mysql [drbd_mysql] (unmanaged)
   Stopped: [ linsrv006 linsrv006.willi-net.local 
 linsrv006.willi-net.local linsrv007 ]
   Clone Set: ping-clone [ping] (unmanaged)
   Stopped: [ linsrv006 linsrv006.willi-net.local 
 linsrv006.willi-net.local linsrv007 ]
   Resource Group: mysql
   fs_mysql(ocf::heartbeat:Filesystem):Stopped (unmanaged)
   mysqld(ocf::heartbeat:mysql):Stopped (unmanaged)
   ip_mysql(ocf::heartbeat:IPaddr2):Stopped (unmanaged)
 
 PCSD Status:
 Error: no nodes found in corosync.conf
 
 Please support me to fix my Corosync issue.

-- 
Adfinis SyGroup AG
Lukas Grossar, System Engineer

Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06


pgpAkEMFvlRru.pgp
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] pgsql resource agent in status Stopped after crm resource cleanup

2014-02-21 Thread Lukas Grossar
Hi

I'm currently building a 2 node DRBD backed PostgreSQL on Debian Wheezy
and I'm testing how Pacemaker reacts to specific failure scenarios.

One thing I did test that currently drives me crazy is when I manually
stop PostgreSQL trough pg_ctl or just kill the master process to
simulate a crash the pgsql resource agent correctly detects the error
and restarts PostgreSQL.

The problem is have arises when I later call 'crm resource cleanup
pgsql' to delete the failcount and the failed tasks the pgsql resources
shows up as Stopped, but in reality it is still running fine. I'm
having the same problem when I delete the failcount separately and then
do the cleanup.

The problem seems to be that psql_monitor runs into a timeout:
Feb 21 12:47:59 vm-db-01 crmd: [6494]: WARN: cib_action_update:
rsc_op 44: pgsql_monitor_3 on vm-db-01 timed out

After the timeout pgsql is being restarted, and the interesting thing
is that I can delete the failed action from the timeout without a
problem.

Does anyone have an idea what the problem could be in this case?

Best regards
Lukas

-- 
Adfinis SyGroup AG
Lukas Grossar, System Engineer

Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06


signature.asc
Description: PGP signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] heartbeat failover

2014-01-23 Thread Lukas Grossar
Hi Björn

Here ist an example how you can setup corosync to use Unicast UDP:
https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu

The important parts are transport: udpu and that you need to
configure every member manually using memberaddr: 10.16.35.115.

Best regards
Lukas


On Thu, 23 Jan 2014 13:36:22 +
bjoern.bec...@easycash.de wrote:

 Hello,
 
 thanks a lot! I didn't know about heartbeat is almost deprecated.
 I'll try corosync and pacemaker, but I read that corosync need to run
 over multicast. Unfortunately, I can't use multicast in my network.
 Do you know any other possibility, I can't find anything that
 corosync can run without multicast?
 
 
 Best regards
 Björn 
 
 -Ursprüngliche Nachricht-
 
 Von: linux-ha-boun...@lists.linux-ha.org
 [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer
 Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA
 mailing list Betreff: Re: [Linux-HA] heartbeat failover
 
 On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote:
  Hello,
 
  I got a drbd+nfs+heartbeat setup and in general it's working. But
  it takes to long to failover and I try to tune this.
 
  When node 1 is active and I shutdown node 2, then node 1 try to
  activate the cluster. The problem is, node 1 already got the
  primary role and when re-activating it take time again and during
  this the nfs share isn't available.
 
  Is it possible to disable this? Node 1 don't have to do anything if
  it's already in primary role and the second node is not available.
 
  Mit freundlichen Grüßen / Best regards Björn
 
 If this is a new project, I strongly recommend switching out
 heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't
 been developed in a long time and there are no plans to restart
 development in the future. Everything (even RH) is standardizing on
 the corosync+pacemaker stack, so it has the most vibrant community as
 well.
 



-- 
Adfinis SyGroup AG
Lukas Grossar, System Engineer

Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06


signature.asc
Description: PGP signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] stonithd doesn't use power off action

2013-02-20 Thread Lukas Grossar
On 20.02.2013 09:28, Bernd Schubert wrote:
 On 02/19/2013 10:58 PM, Andrew Beekhof wrote:
 On Tue, Feb 19, 2013 at 11:26 PM, Bernd Schubert
 bernd.schub...@fastmail.fm wrote:
 On 02/19/2013 06:53 AM, Andrew Beekhof wrote:
 On Mon, Feb 18, 2013 at 7:34 PM, Bruce Ford pyrok...@gmail.com wrote:
 Lukas,

 thanks for the quick reply.

 On Fri, Feb 15, 2013 at 4:54 PM, Lukas Grossar
 lukas.gros...@adfinis-sygroup.ch wrote:
 On 15.02.2013 16:43, Bruce Ford wrote:
 Hi,

 I'm running pacemaker 1.1.7 on RedHat 6.3 using the fence_ipmilan
 fence agent from the fence-agents 3.1.5 package.

 I found that although I have chosen the action off, this doesn't
 power off the target node but reboots it with a graceful shutdown. So
 I investigated on the commandline:

 I ran into the same problem when setting up a cluster using CentOS 6.3
 and sent a mail to the mailing list about a week ago and got the
 following reaction from Andrew Beekhof:

 Prior to 6.4 there was some inconsistency between the various agents
 and whether they supported action or option.
 An upgrade to 6.4 in the next few weeks should solve this for you.

 Does 6.4 mean RedHat/Centos 6.4? What a pity, this is currently not an 
 option.
 Will we face serious problems trying to backport the new fence-agents 
 package?

 No, should be pretty straightforward

 So that will introduce another serious change of behaviour in RHEL 6.4?

 No. All agents now support action.  Anything that used to support
 option will continue to do so.
 
 Hmm, I'm still not sure if I understand it correctly. So with 6.4 one 
 has to set (in crm syntax):
 
 property stonith-action=reboot
 ?
 
 Right now we have:
 
 property stonith-action=poweroff
 
 and the fence_ipmilan option: action=off
 
 And that leads to a reboot, as it is supposed to do for this 
 installation.

That may be what it is supposed to do for this installation, but it is
not what it is supposed to do according to the documentation/man page.

 I definitely remember that my colleague who did this 
 installation had some trouble to get fence_ipmilan to do what we 
 intended to do.
 
 
 Thanks,
 Bernd


0xF4F014C4.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] stonithd doesn't use power off action

2013-02-15 Thread Lukas Grossar
On 15.02.2013 16:43, Bruce Ford wrote:
 Hi,
 
 I'm running pacemaker 1.1.7 on RedHat 6.3 using the fence_ipmilan
 fence agent from the fence-agents 3.1.5 package.
 
 I found that although I have chosen the action off, this doesn't
 power off the target node but reboots it with a graceful shutdown. So
 I investigated on the commandline:

I ran into the same problem when setting up a cluster using CentOS 6.3
and sent a mail to the mailing list about a week ago and got the
following reaction from Andrew Beekhof:

 Prior to 6.4 there was some inconsistency between the various agents
 and whether they supported action or option.
 An upgrade to 6.4 in the next few weeks should solve this for you.

Regards,
Lukas


0xF4F014C4.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

[Linux-HA] stonith:fence_ipmilan ignores action parameter

2013-02-06 Thread Lukas Grossar
Hi,

I'm running a CentOS 6.3 based two-node cluster and wanted to avoid a
possible STONITH death match. I configured STONITH to shut down the
other node instead of rebooting it by setting action=off in the
STONITH primitive

primitive r_STONITH_ipmi_sr-easyc-lamp-01 stonith:fence_ipmilan \
params auth=password ipaddr=192.168.42.21 \
login=stonith passwd=xxx action=off priority=1 \
op start interval=0 timeout=20 \
op stop interval=0 timeout=20 \
op monitor interval=300s timeout=60

but STONITH kept rebooting the machine. I found an earlier post [1] on
the mailing list discussing the exact same problem. I already tried to
set stonith-action=poweroff but then fence_ipmilan complains that it
doesn't understand poweroff.

Any help would be appreciated.

Regards,
Lukas

[1] http://lists.linux-ha.org/pipermail/linux-ha/2011-February/042423.html


-- 
Adfinis SyGroup AG
Lukas Grossar, Junior System Engineer

Keltenstrasse 98 | CH-3018 Bern
Tel. 031 550 31 11 | Direkt 031 550 31 06


0xF4F014C4.asc
Description: application/pgp-keys


signature.asc
Description: OpenPGP digital signature
___
Linux-HA mailing list
Linux-HA@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems