Re: [Pacemaker] Favor one node during stonith?

2014-08-14 Thread Andrei Borzenkov
В Thu, 14 Aug 2014 12:45:27 +1000 Andrew Beekhof and...@beekhof.net пишет: It statically assigns priorities to cluster nodes. I need to dynamically assign higher priority (lower delay) to a node that is currently running application to ensure that application survives. It was

[Pacemaker] Master is restarted when other node comes online

2014-09-28 Thread Andrei Borzenkov
I have two node cluster with single master/slave resource (replicated database) using pacemaker+openais on SLES11 SP3 (pacemaker 1.1.11-3ca8c3b). I hit weird situation that I did not see before, and I cannot really understand it. Assuming master runs on node A and slave runs on node B. If I stop

Re: [Pacemaker] Master is restarted when other node comes online

2014-09-28 Thread Andrei Borzenkov
explains dependencies between two cloned resources. I have single cloned resource, without any external dependency. I'm afraid I miss how interleave helps here. Thank you! 2014-09-28 9:56 GMT+02:00 Andrei Borzenkov arvidj...@gmail.com: I have two node cluster with single master/slave resource

[Pacemaker] Show all resource properties with crmsh

2014-10-02 Thread Andrei Borzenkov
Is it possible to display values for all resource properties, including those set to default values? cibadmin or crm configure show display only explicitly set properties, and crm_resource or crm resource meta work with single property only. Ideally I'd like to get actual values of all resource

[Pacemaker] When pacemaker expects resource to go directly to Master after start?

2014-10-02 Thread Andrei Borzenkov
According to documentation (Pacemaker 1.1.x explained) when [Master/Slave] the resource is started, it must come up in the mode called Slave. But what I observe here - in some cases pacemaker treats Slave state as error. As example (pacemaker 1.1.9): Oct 2 13:23:34 cn1 pengine[9446]: notice:

Re: [Pacemaker] When pacemaker expects resource to go directly to Master after start?

2014-10-02 Thread Andrei Borzenkov
/ocf/resource.d/test/Dummy... * Your agent does not support the notify action (optional) /usr/lib/ocf/resource.d/test/Dummy passed all tests cn1:/usr/lib/ocf/resource.d # 2014-10-02 12:02 GMT+02:00 Andrei Borzenkov arvidj...@gmail.com: According to documentation (Pacemaker 1.1.x explained

[Pacemaker] Stopping/restarting pacemaker without stopping resources?

2014-10-16 Thread Andrei Borzenkov
The primary goal is to transparently update software in cluster. I just did HA suite update using simple RPM and observed that RPM attempts to restart stack (rcopenais try-restart). So a) if it worked, it would mean resources had been migrated from this node - interruption b) it did not work -

Re: [Pacemaker] Show all resource properties with crmsh

2014-10-17 Thread Andrei Borzenkov
В Tue, 14 Oct 2014 11:06:28 +0200 Dejan Muhamedagic deja...@fastmail.fm пишет: Do you have any specific meta attributes in mind? Taking a cursory look doesn't reveal anything very interesting. I was recently bitten by interleave. Having not much experience so far, I do not know if there

Re: [Pacemaker] Y should pacemaker be started simultaneously.

2014-10-17 Thread Andrei Borzenkov
В Mon, 06 Oct 2014 10:27:49 -0400 Digimer li...@alteeve.ca пишет: On 06/10/14 02:11 AM, Andrei Borzenkov wrote: On Mon, Oct 6, 2014 at 9:03 AM, Digimer li...@alteeve.ca wrote: If stonith was configured, after the time out, the first node would fence the second node (unable to reach != off

[Pacemaker] How to find out why pacemaker skipped action?

2014-10-21 Thread Andrei Borzenkov
Pacemaker 1.1.11. I see in engine logs that it is going to restart resource: Oct 21 12:34:50 n2 pengine[19748]: notice: LogActions: Restart rsc_SAPHana_HDB_HDB00:0 (Master n2) But I never see actual stop/start action being executed and in summary I get Oct 21 12:35:11 n2 crmd[19749]:

Re: [Pacemaker] How to find out why pacemaker skipped action?

2014-10-21 Thread Andrei Borzenkov
On Wed, Oct 22, 2014 at 8:01 AM, Andrew Beekhof and...@beekhof.net wrote: On 21 Oct 2014, at 11:15 pm, Andrei Borzenkov arvidj...@gmail.com wrote: Pacemaker 1.1.11. I see in engine logs that it is going to restart resource: Oct 21 12:34:50 n2 pengine[19748]: notice: LogActions: Restart

Re: [Pacemaker] meta failure-timeout: crashed resource is assumed to be Started?

2014-10-23 Thread Andrei Borzenkov
В Thu, 23 Oct 2014 13:46:00 +0200 Carsten Otto carsten.o...@andrena.de пишет: Dear all, I did not get any response so far. Could you please find the time and tell me how the meta failure-timeout is supposed to work, in combination with monitor operations? If you attach unedited logs from

Re: [Pacemaker] Master-slave master not promoted on Corosync restart

2014-10-24 Thread Andrei Borzenkov
On Fri, Oct 24, 2014 at 2:00 PM, Sékine Coulibaly scoulib...@gmail.com wrote: Hi Andrew, Yep, forgot the attachments. I did reproduce the issue, please find the bz2 files attached. Please tell if you need hb_report being used. You forgot to attach log matching these files.

Re: [Pacemaker] Stopping/restarting pacemaker without stopping resources?

2014-10-24 Thread Andrei Borzenkov
On Fri, Oct 24, 2014 at 9:17 AM, Andrew Beekhof and...@beekhof.net wrote: On 16 Oct 2014, at 9:31 pm, Andrei Borzenkov arvidj...@gmail.com wrote: The primary goal is to transparently update software in cluster. I just did HA suite update using simple RPM and observed that RPM attempts

Re: [Pacemaker] MySQL, Percona replication manager - split brain

2014-10-26 Thread Andrei Borzenkov
В Sat, 25 Oct 2014 23:34:54 +0300 Andrew ni...@seti.kr.ua пишет: 25.10.2014 22:34, Digimer пишет: On 25/10/14 03:32 PM, Andrew wrote: Hi all. I use Percona as RA on cluster (nothing mission-critical, currently - just zabbix data); today after restarting MySQL resource (crm resource

Re: [Pacemaker] MySQL, Percona replication manager - split brain

2014-10-26 Thread Andrei Borzenkov
В Sun, 26 Oct 2014 10:51:13 +0200 Andrew ni...@seti.kr.ua пишет: 26.10.2014 08:32, Andrei Borzenkov пишет: В Sat, 25 Oct 2014 23:34:54 +0300 Andrew ni...@seti.kr.ua пишет: 25.10.2014 22:34, Digimer пишет: On 25/10/14 03:32 PM, Andrew wrote: Hi all. I use Percona as RA on cluster

Re: [Pacemaker] split brain - after network recovery - resources can still be migrated

2014-10-26 Thread Andrei Borzenkov
В Sun, 26 Oct 2014 12:01:03 +0100 Vladimir m...@foomx.de пишет: On Sat, 25 Oct 2014 19:11:02 -0400 Digimer li...@alteeve.ca wrote: On 25/10/14 06:35 PM, Vladimir wrote: On Sat, 25 Oct 2014 17:30:07 -0400 Digimer li...@alteeve.ca wrote: On 25/10/14 05:09 PM, Vladimir wrote:

Re: [Pacemaker] Stopping/restarting pacemaker without stopping resources?

2014-10-26 Thread Andrei Borzenkov
В Mon, 27 Oct 2014 11:09:08 +1100 Andrew Beekhof and...@beekhof.net пишет: On 25 Oct 2014, at 12:38 am, Andrei Borzenkov arvidj...@gmail.com wrote: On Fri, Oct 24, 2014 at 9:17 AM, Andrew Beekhof and...@beekhof.net wrote: On 16 Oct 2014, at 9:31 pm, Andrei Borzenkov arvidj

Re: [Pacemaker] Stopping/restarting pacemaker without stopping resources?

2014-10-27 Thread Andrei Borzenkov
On Mon, Oct 27, 2014 at 6:34 AM, Andrew Beekhof and...@beekhof.net wrote: On 27 Oct 2014, at 2:30 pm, Andrei Borzenkov arvidj...@gmail.com wrote: В Mon, 27 Oct 2014 11:09:08 +1100 Andrew Beekhof and...@beekhof.net пишет: On 25 Oct 2014, at 12:38 am, Andrei Borzenkov arvidj...@gmail.com

Re: [Pacemaker] How to find out why pacemaker skipped action?

2014-10-27 Thread Andrei Borzenkov
On Wed, Oct 22, 2014 at 8:59 AM, Andrew Beekhof and...@beekhof.net wrote: On 22 Oct 2014, at 4:34 pm, Andrei Borzenkov arvidj...@gmail.com wrote: On Wed, Oct 22, 2014 at 8:01 AM, Andrew Beekhof and...@beekhof.net wrote: On 21 Oct 2014, at 11:15 pm, Andrei Borzenkov arvidj...@gmail.com wrote

Re: [Pacemaker] How to find out why pacemaker skipped action?

2014-10-27 Thread Andrei Borzenkov
On Mon, Oct 27, 2014 at 12:40 PM, Andrew Beekhof and...@beekhof.net wrote: On 27 Oct 2014, at 7:36 pm, Andrei Borzenkov arvidj...@gmail.com wrote: On Wed, Oct 22, 2014 at 8:59 AM, Andrew Beekhof and...@beekhof.net wrote: On 22 Oct 2014, at 4:34 pm, Andrei Borzenkov arvidj...@gmail.com wrote

Re: [Pacemaker] Split brain and STONITH behavior (VMware fencing)

2014-10-29 Thread Andrei Borzenkov
On Wed, Oct 29, 2014 at 10:46 AM, Ariel S ariel_bis2...@yahoo.co.id wrote: Hello, I'm trying to understand how this STONITH works. I have 2 VMware VMs (moon1a, moon1b) on two different hosts. Each have 2 nic assigned: eth0 for heartbeat while eth1 used for everything else. This is my

[Pacemaker] pacemaker counts probe failure twice

2014-10-29 Thread Andrei Borzenkov
I observe strange behavior that I cannot understand. Pacemaker 1.1.11-3ca8c3b. There is master/slave resource running. Maintenance-mode was set, pacemaker restarted, maintenance-mode reset. This specific RA returns Slave instead of Master for the first probe. But what happens later is rather

Re: [Pacemaker] pacemaker counts probe failure twice

2014-10-29 Thread Andrei Borzenkov
В Thu, 30 Oct 2014 08:32:24 +1100 Andrew Beekhof and...@beekhof.net пишет: On 29 Oct 2014, at 10:01 pm, Andrei Borzenkov arvidj...@gmail.com wrote: I observe strange behavior that I cannot understand. Pacemaker 1.1.11-3ca8c3b. There is master/slave resource running. Maintenance

Re: [Pacemaker] stonith q

2014-11-02 Thread Andrei Borzenkov
В Sun, 2 Nov 2014 10:01:59 + Alex Samad - Yieldbroker alex.sa...@yieldbroker.com пишет: -Original Message- From: Digimer [mailto:li...@alteeve.ca] Sent: Sunday, 2 November 2014 9:49 AM To: The Pacemaker cluster resource manager Subject: Re: [Pacemaker] stonith q On

Re: [Pacemaker] stonith q

2014-11-02 Thread Andrei Borzenkov
: [Pacemaker] stonith q On 02/11/14 06:45 AM, Andrei Borzenkov wrote: В Sun, 2 Nov 2014 10:01:59 + Alex Samad - Yieldbroker alex.sa...@yieldbroker.com пишет: -Original Message- From: Digimer [mailto:li...@alteeve.ca] Sent: Sunday, 2 November 2014 9:49 AM

Re: [Pacemaker] Occasional nonsensical resource agent errors, redux

2014-11-02 Thread Andrei Borzenkov
В Mon, 3 Nov 2014 13:32:45 +1100 Andrew Beekhof and...@beekhof.net пишет: On 1 Nov 2014, at 11:03 pm, Patrick Kane p...@wawd.com wrote: Hi all: In July, list member Ken Gaillot reported occasional nonsensical resource agent errors using Pacemaker

Re: [Pacemaker] Occasional nonsensical resource agent errors, redux

2014-11-03 Thread Andrei Borzenkov
В Mon, 3 Nov 2014 15:26:34 +0100 Dejan Muhamedagic deja...@fastmail.fm пишет: Hi, On Mon, Nov 03, 2014 at 08:46:00AM +0300, Andrei Borzenkov wrote: В Mon, 3 Nov 2014 13:32:45 +1100 Andrew Beekhof and...@beekhof.net пишет: On 1 Nov 2014, at 11:03 pm, Patrick Kane p...@wawd.com

Re: [Pacemaker] stonith q

2014-11-04 Thread Andrei Borzenkov
В Mon, 3 Nov 2014 07:07:41 + Alex Samad - Yieldbroker alex.sa...@yieldbroker.com пишет: {snip} What I am hearing is that its not available. Is it possible to hook to a custom script on that event, I can write my own restart Sure you can write your own external stonith script.

Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-07 Thread Andrei Borzenkov
В Fri, 07 Nov 2014 17:46:40 +0100 Daniel Dehennin daniel.dehen...@baby-gnu.org пишет: Hello, As I finally manage to integrate my VM to corosync and my dlm/clvm/GFS2 are running on it. Now I have one issue, when the bare metal host on which the VM is running die, the VM is lost and can

Re: [Pacemaker] Fencing dependency between bare metal host and its VMs guest

2014-11-10 Thread Andrei Borzenkov
. One thing I do not know how it will behave in case of multiple VMs on the same host. I.e. will pacemaker try to fence host for every VM or recognize that all VMs are dead after the first time agent is invoked. Daniel Dehennin daniel.dehen...@baby-gnu.org napisał: Andrei Borzenkov arvidj

Re: [Pacemaker] Daemon Start attempt on wrong Server

2014-11-11 Thread Andrei Borzenkov
В Tue, 11 Nov 2014 16:19:56 +0100 Hauke Homburg hhomb...@w3-creative.de пишет: Am 11.11.2014 13:34, schrieb Alexandre: You should use an opt out cluster. Set the cluster option symmetrical=false. This will tell corosync not to place a resource anywhere on the cluster, unless a

Re: [Pacemaker] Long failover

2014-11-14 Thread Andrei Borzenkov
On Fri, Nov 14, 2014 at 2:57 PM, Dmitry Matveichev d.matveic...@mfisoft.ru wrote: Hello, We have a cluster configured via pacemaker+corosync+crm. The configuration is: node master node slave primitive HA-VIP1 IPaddr2 \ params ip=192.168.22.71 nic=bond0 \ op

Re: [Pacemaker] Long failover

2014-11-14 Thread Andrei Borzenkov
On Fri, Nov 14, 2014 at 4:33 PM, Dmitry Matveichev d.matveic...@mfisoft.ru wrote: We've already tried to set it but it didn't help. I doubt it is possible to say anything without logs. Kind regards, Dmitriy Matveichev. -Original Message- From: Andrei

Re: [Pacemaker] Long failover

2014-11-16 Thread Andrei Borzenkov
On Mon, Nov 17, 2014 at 9:34 AM, Andrew Beekhof and...@beekhof.net wrote: On 14 Nov 2014, at 10:57 pm, Dmitry Matveichev d.matveic...@mfisoft.ru wrote: Hello, We have a cluster configured via pacemaker+corosync+crm. The configuration is: node master node slave primitive HA-VIP1

Re: [Pacemaker] Hard error Preventing a service to restart

2014-11-20 Thread Andrei Borzenkov
В Thu, 20 Nov 2014 14:53:10 + Brusq, Jerome jerome.br...@signalis.com пишет: Dear all, We are running corosync-1.4.1-4.el6.x86_64 on RHEL 6.2. We have a cluster with 2 nodes that run as services a Virtual IP and around 20 custom LSB scripts. We had some network issue 2 weeks ago and

Re: [Pacemaker] Issues with ocf_run on CentOS7 with pgsql resource agent

2014-11-21 Thread Andrei Borzenkov
On Fri, Nov 21, 2014 at 12:36 AM, Brendan Reekie bree...@sandvine.com wrote: I’m running into an issues with the pgsql resource agent running on CentOS7. The issue is when the pgsql resource agent attempts a call to runasowner it uses a su as user postgres in a call to ocf_run, this is causing

Re: [Pacemaker] Suicide fencing and watchdog questions

2014-11-28 Thread Andrei Borzenkov
В Thu, 27 Nov 2014 08:24:56 +0300 Vladislav Bogdanov bub...@hoster-ok.com пишет: 27.11.2014 03:43, Andrew Beekhof wrote: On 25 Nov 2014, at 10:37 pm, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi, Is there any information how watchdog integration is intended to work? What

Re: [Pacemaker] Failed-over incomplete

2014-12-04 Thread Andrei Borzenkov
. There is no resource res.vBKN in your logs or configuration snippet you have shown. But why? Since there is no configuration is changed. --teenigma On Thu, Dec 4, 2014 at 1:41 PM, Andrei Borzenkov arvidj...@gmail.com wrote: On Thu, Dec 4, 2014 at 4:56 AM, Teerapatr Kittiratanachai maillist...@gmail.com wrote

Re: [Pacemaker] location minus points

2014-12-17 Thread Andrei Borzenkov
В Wed, 17 Dec 2014 16:38:56 +0100 Thomas Manninger dbgtmas...@gmx.at пишет: Hi, i am using pacemaker on debian in a test environment. pacemaker version is: 1.1.7-1 I have a problem unterstanding minus points in a location. I have 3 nodes in my test environment, node1, node2 and

Re: [Pacemaker] [libqb]Unlink of files bound to sockets

2014-12-22 Thread Andrei Borzenkov
В Mon, 22 Dec 2014 16:25:00 -0500 (EST) David Vossel dvos...@redhat.com пишет: Linux has this non-portable concept of abstract sockets. This lets us create sockets without mapping directly to something on the filesystem. Unfortunately solaris doesn't have this feature. Somewhere along the

Re: [Pacemaker] CoroSync's UDPu transport for public IP addresses?

2014-12-29 Thread Andrei Borzenkov
On Mon, Dec 29, 2014 at 1:50 PM, Dejan Muhamedagic deja...@fastmail.fm wrote: Hi, On Mon, Dec 29, 2014 at 06:11:49AM +0300, Dmitry Koterov wrote: Hello. I have a geographically distributed cluster, all machines have public IP addresses. No virtual IP subnet exists, so no multicast is

Re: [Pacemaker] Master-Slave role stickiness

2015-01-22 Thread Andrei Borzenkov
On Wed, Jan 21, 2015 at 11:06 PM, brook davis brook.da...@nimboxx.com wrote: Hi, I've got a master-slave resource and I'd like to achieve the following behavior with it: * Only ever run (as master or slave) on 2 specific nodes (out of N possible nodes). These nodes are predetermined and

Re: [Pacemaker] Corosync fails to start when NIC is absent

2015-01-20 Thread Andrei Borzenkov
On Tue, Jan 20, 2015 at 11:50 AM, Jan Friesse jfrie...@redhat.com wrote: Kostiantyn, One more thing to clarify. You said rebind can be avoided - what does it mean? By that I mean that as long as you don't shutdown interface everything will work as expected. Interface shutdown is

Re: [Pacemaker] Unique clone instance is stopped too early on move

2015-01-13 Thread Andrei Borzenkov
On Tue, Jan 13, 2015 at 10:20 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, David, all. I found a little bit strange operation ordering during transition execution. Could you please look at the following partial configuration (crmsh syntax)? === ... clone cl-broker broker

Re: [Pacemaker] Two node cluster and no hardware device for stonith.

2015-02-10 Thread Andrei Borzenkov
В Tue, 10 Feb 2015 15:58:57 +0100 Dejan Muhamedagic deja...@fastmail.fm пишет: On Mon, Feb 09, 2015 at 04:41:19PM +0100, Lars Ellenberg wrote: On Fri, Feb 06, 2015 at 04:15:44PM +0100, Dejan Muhamedagic wrote: Hi, On Thu, Feb 05, 2015 at 09:18:50AM +0100, Digimer wrote: That is

Re: [Pacemaker] Configuring fencing with encrypted passwords

2015-02-27 Thread Andrei Borzenkov
On Fri, Feb 27, 2015 at 12:31 PM, Arjun Pandey apandepub...@gmail.com wrote: Hi I am facing some issues while trying out fence_ipmilan on ILO4 setup when using encrypted passwords. Which mailing list can i contact for this ? Basically if i test out fence_ipmilan/fence_ilo4 from cmd line with

Re: [Pacemaker] stonith

2015-04-19 Thread Andrei Borzenkov
В Sun, 19 Apr 2015 14:23:27 +0200 Andreas Kurz andreas.k...@gmail.com пишет: On 2015-04-17 12:36, Thomas Manninger wrote: Hi list, i have a pacemaker/corosync2 setup with 4 nodes, stonith configured over ipmi interface. My problem is, that sometimes, a wrong node is stonithed.

Re: [Pacemaker] Question about multiple instance_attributes

2015-04-13 Thread Andrei Borzenkov
On Mon, Apr 13, 2015 at 12:32 PM, Kazunori INOUE kazunori.ino...@gmail.com wrote: 2015-04-10 15:09 GMT+09:00 Andrei Borzenkov arvidj...@gmail.com: On Fri, Apr 10, 2015 at 8:22 AM, Kazunori INOUE kazunori.ino...@gmail.com wrote: Hi, I defined multiple instance_attributes [*]. * http

Re: [Pacemaker] writing a script

2015-08-12 Thread Andrei Borzenkov
On 12.08.2015 17:18, Michael Böhm wrote: 2015-08-12 15:45 GMT+02:00 Karl Rößmann k.roessm...@fkf.mpg.de: Hi, is there an easy way to determine in a script (bash or python) whether a CRM Resource (Xen Domain) is running or not ? We query the location of a resource in bash with this: if

Re: [Pacemaker] create 2 node cluster with clvm starting with one node

2015-09-24 Thread Andrei Borzenkov
On Thu, Sep 24, 2015 at 3:26 PM, Sven Moeller wrote: > Hi, > > I have to build a 2 node NFS Cluster based on Pacemaker/Corosync. The Volume > used for the filesystem that will be exported by NFS is on a shared storage. > I would like to use cLVM on this Volume. Why do