Re: [Linux-HA] CentOS 7 Pacemake/Corosync
Hi, I can't offer a solution, but I remember having run into similar issues when the output of hostname showed the FQDN. Maybe that helps you narrow down the problem. Best regards Lukas Grossar On Fri, 06 Mar 2015 09:12:47 +0100 Willi Fehler willi.feh...@t-online.de wrote: Hi, I'm trying to build a Pacemaker/Corosync Cluster on CentOS7. The default Corosync configuration with one ring is working but then I only have 1 ring and no encryption. RING ID 1 id= 10.10.10.1 status= ring 1 active with no faults I've tried to activate the following configuration but it doesn't work. [root@linsrv006 corosync]# cat /etc/corosync/corosync.conf totem { version: 2 secauth: on threads: 0 rrp_mode: active interface { ringnumber: 0 bindnetaddr: 192.168.0.0 mcastaddr: 226.94.42.7 mcastport: 5411 } interface { ringnumber: 1 bindnetaddr: 10.10.10.0 mcastaddr: 226.94.42.11 mcastport: 5419 } token: 1 token_retransmits_before_loss_const: 40 rrp_problem_count_timeout: 2 nodeid: 3 } quorum { provider: corosync_votequorum expected_votes: 3 } logging { to_syslog: yes } The problem is also than, that pcs status is showing 3 nodes. [root@linsrv006 corosync]# pcs status Cluster name: Last updated: Fri Mar 6 09:10:41 2015 Last change: Fri Mar 6 09:10:30 2015 via crmd on linsrv006.willi-net.local Stack: corosync Current DC: NONE 4 Nodes configured 9 Resources configured OFFLINE: [ linsrv006 linsrv006.willi-net.local linsrv006.willi-net.local linsrv007 ] Full list of resources: Master/Slave Set: ms_drbd_mysql [drbd_mysql] (unmanaged) Stopped: [ linsrv006 linsrv006.willi-net.local linsrv006.willi-net.local linsrv007 ] Clone Set: ping-clone [ping] (unmanaged) Stopped: [ linsrv006 linsrv006.willi-net.local linsrv006.willi-net.local linsrv007 ] Resource Group: mysql fs_mysql(ocf::heartbeat:Filesystem):Stopped (unmanaged) mysqld(ocf::heartbeat:mysql):Stopped (unmanaged) ip_mysql(ocf::heartbeat:IPaddr2):Stopped (unmanaged) PCSD Status: Error: no nodes found in corosync.conf Please support me to fix my Corosync issue. -- Adfinis SyGroup AG Lukas Grossar, System Engineer Keltenstrasse 98 | CH-3018 Bern Tel. 031 550 31 11 | Direkt 031 550 31 06 pgpAkEMFvlRru.pgp Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] pgsql resource agent in status Stopped after crm resource cleanup
Hi I'm currently building a 2 node DRBD backed PostgreSQL on Debian Wheezy and I'm testing how Pacemaker reacts to specific failure scenarios. One thing I did test that currently drives me crazy is when I manually stop PostgreSQL trough pg_ctl or just kill the master process to simulate a crash the pgsql resource agent correctly detects the error and restarts PostgreSQL. The problem is have arises when I later call 'crm resource cleanup pgsql' to delete the failcount and the failed tasks the pgsql resources shows up as Stopped, but in reality it is still running fine. I'm having the same problem when I delete the failcount separately and then do the cleanup. The problem seems to be that psql_monitor runs into a timeout: Feb 21 12:47:59 vm-db-01 crmd: [6494]: WARN: cib_action_update: rsc_op 44: pgsql_monitor_3 on vm-db-01 timed out After the timeout pgsql is being restarted, and the interesting thing is that I can delete the failed action from the timeout without a problem. Does anyone have an idea what the problem could be in this case? Best regards Lukas -- Adfinis SyGroup AG Lukas Grossar, System Engineer Keltenstrasse 98 | CH-3018 Bern Tel. 031 550 31 11 | Direkt 031 550 31 06 signature.asc Description: PGP signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] heartbeat failover
Hi Björn Here ist an example how you can setup corosync to use Unicast UDP: https://github.com/fghaas/corosync/blob/master/conf/corosync.conf.example.udpu The important parts are transport: udpu and that you need to configure every member manually using memberaddr: 10.16.35.115. Best regards Lukas On Thu, 23 Jan 2014 13:36:22 + bjoern.bec...@easycash.de wrote: Hello, thanks a lot! I didn't know about heartbeat is almost deprecated. I'll try corosync and pacemaker, but I read that corosync need to run over multicast. Unfortunately, I can't use multicast in my network. Do you know any other possibility, I can't find anything that corosync can run without multicast? Best regards Björn -Ursprüngliche Nachricht- Von: linux-ha-boun...@lists.linux-ha.org [mailto:linux-ha-boun...@lists.linux-ha.org] Im Auftrag von Digimer Gesendet: Mittwoch, 22. Januar 2014 20:36 An: General Linux-HA mailing list Betreff: Re: [Linux-HA] heartbeat failover On 22/01/14 10:44 AM, bjoern.bec...@easycash.de wrote: Hello, I got a drbd+nfs+heartbeat setup and in general it's working. But it takes to long to failover and I try to tune this. When node 1 is active and I shutdown node 2, then node 1 try to activate the cluster. The problem is, node 1 already got the primary role and when re-activating it take time again and during this the nfs share isn't available. Is it possible to disable this? Node 1 don't have to do anything if it's already in primary role and the second node is not available. Mit freundlichen Grüßen / Best regards Björn If this is a new project, I strongly recommend switching out heartbeat for corosync/pacemaker. Heartbeat is deprecated, hasn't been developed in a long time and there are no plans to restart development in the future. Everything (even RH) is standardizing on the corosync+pacemaker stack, so it has the most vibrant community as well. -- Adfinis SyGroup AG Lukas Grossar, System Engineer Keltenstrasse 98 | CH-3018 Bern Tel. 031 550 31 11 | Direkt 031 550 31 06 signature.asc Description: PGP signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] stonithd doesn't use power off action
On 20.02.2013 09:28, Bernd Schubert wrote: On 02/19/2013 10:58 PM, Andrew Beekhof wrote: On Tue, Feb 19, 2013 at 11:26 PM, Bernd Schubert bernd.schub...@fastmail.fm wrote: On 02/19/2013 06:53 AM, Andrew Beekhof wrote: On Mon, Feb 18, 2013 at 7:34 PM, Bruce Ford pyrok...@gmail.com wrote: Lukas, thanks for the quick reply. On Fri, Feb 15, 2013 at 4:54 PM, Lukas Grossar lukas.gros...@adfinis-sygroup.ch wrote: On 15.02.2013 16:43, Bruce Ford wrote: Hi, I'm running pacemaker 1.1.7 on RedHat 6.3 using the fence_ipmilan fence agent from the fence-agents 3.1.5 package. I found that although I have chosen the action off, this doesn't power off the target node but reboots it with a graceful shutdown. So I investigated on the commandline: I ran into the same problem when setting up a cluster using CentOS 6.3 and sent a mail to the mailing list about a week ago and got the following reaction from Andrew Beekhof: Prior to 6.4 there was some inconsistency between the various agents and whether they supported action or option. An upgrade to 6.4 in the next few weeks should solve this for you. Does 6.4 mean RedHat/Centos 6.4? What a pity, this is currently not an option. Will we face serious problems trying to backport the new fence-agents package? No, should be pretty straightforward So that will introduce another serious change of behaviour in RHEL 6.4? No. All agents now support action. Anything that used to support option will continue to do so. Hmm, I'm still not sure if I understand it correctly. So with 6.4 one has to set (in crm syntax): property stonith-action=reboot ? Right now we have: property stonith-action=poweroff and the fence_ipmilan option: action=off And that leads to a reboot, as it is supposed to do for this installation. That may be what it is supposed to do for this installation, but it is not what it is supposed to do according to the documentation/man page. I definitely remember that my colleague who did this installation had some trouble to get fence_ipmilan to do what we intended to do. Thanks, Bernd 0xF4F014C4.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] stonithd doesn't use power off action
On 15.02.2013 16:43, Bruce Ford wrote: Hi, I'm running pacemaker 1.1.7 on RedHat 6.3 using the fence_ipmilan fence agent from the fence-agents 3.1.5 package. I found that although I have chosen the action off, this doesn't power off the target node but reboots it with a graceful shutdown. So I investigated on the commandline: I ran into the same problem when setting up a cluster using CentOS 6.3 and sent a mail to the mailing list about a week ago and got the following reaction from Andrew Beekhof: Prior to 6.4 there was some inconsistency between the various agents and whether they supported action or option. An upgrade to 6.4 in the next few weeks should solve this for you. Regards, Lukas 0xF4F014C4.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] stonith:fence_ipmilan ignores action parameter
Hi, I'm running a CentOS 6.3 based two-node cluster and wanted to avoid a possible STONITH death match. I configured STONITH to shut down the other node instead of rebooting it by setting action=off in the STONITH primitive primitive r_STONITH_ipmi_sr-easyc-lamp-01 stonith:fence_ipmilan \ params auth=password ipaddr=192.168.42.21 \ login=stonith passwd=xxx action=off priority=1 \ op start interval=0 timeout=20 \ op stop interval=0 timeout=20 \ op monitor interval=300s timeout=60 but STONITH kept rebooting the machine. I found an earlier post [1] on the mailing list discussing the exact same problem. I already tried to set stonith-action=poweroff but then fence_ipmilan complains that it doesn't understand poweroff. Any help would be appreciated. Regards, Lukas [1] http://lists.linux-ha.org/pipermail/linux-ha/2011-February/042423.html -- Adfinis SyGroup AG Lukas Grossar, Junior System Engineer Keltenstrasse 98 | CH-3018 Bern Tel. 031 550 31 11 | Direkt 031 550 31 06 0xF4F014C4.asc Description: application/pgp-keys signature.asc Description: OpenPGP digital signature ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems