Re: [Pacemaker] pacemaker error

2010-06-23 Thread Vladislav Bogdanov
23.06.2010 13:07, shejimshad M wrote: While installing pacemaker.rpm then we encountered an error 1:pacemaker warning: user mockbuild does not exist - using root warning: group mockbuild does not exist - using root ### [100%] error:

Re: [Pacemaker] [Problem]The problem of the combination of Pacemaker and corosync1.2.7.

2010-08-02 Thread Vladislav Bogdanov
02.08.2010 04:17, renayama19661...@ybb.ne.jp wrote: ... Problem 3) There is a case to fail in the start of a cib process and the attrd process. Jul 30 14:25:46 x3650g attrd: [26258]: ERROR: ais_dispatch: Receiving message body failed: (2) Library error: Resource temporarily unavailable

Re: [Pacemaker] error installing CentOS clvm after using clusterlabs repository

2010-08-03 Thread Vladislav Bogdanov
03.08.2010 10:29, Brett Delle Grazie wrote: ... (c) Try recompiling RHEL 6.x Beta packages - no guarantees here but it should be possible, maybe. To use OCFS2, GFS2 or CLVM with corosync one needs support for userspace cluster stack in DLM, which is missing from EL5 kernel, so this would not

Re: [Pacemaker] Need help using OCFS2 with openais/pacemaker

2010-08-11 Thread Vladislav Bogdanov
11.08.2010 22:09, patrick.ouel...@promutuel.ca пишет: First of all, wow guys great software I love it so far. Second, I hope im posting this at the right place or i'll get flamed. I have followed the great document by Andrew Cluster from scratch but since im using more recent version of

Re: [Pacemaker] Pacemaker initscript

2010-08-25 Thread Vladislav Bogdanov
25.08.2010 08:56, Andrew Beekhof wrote: On Wed, Aug 25, 2010 at 7:39 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi all, pacemaker has # chkconfig - 90 90 in its MCP initscript. Shouldn't it be corrected to 90 10? I thought higher numbers started later and shut down earlier

Re: [Pacemaker] MCP init script to 21/79?

2010-09-03 Thread Vladislav Bogdanov
03.09.2010 19:34, Steven Dake wrote: Nope, they are in a natural order for both start and stop sequences. So lower number means 'do start or stop earlier'. grep '# chkconfig' /etc/init.d/* Ok, thanks. Changed to 10 Given that corosync default is 20/80, shouldnt mcp be 21/79? I think

Re: [Pacemaker] Couldn't find device [/dev/drbd/by-res/wwwdata]. Expected /dev/??? to exist

2010-09-04 Thread Vladislav Bogdanov
04.09.2010 12:26, Alisson Landim wrote: Hey, i notice something when i restarted both computers. The problem of the kernel bug occurs when the second computer loads, until there the first computer was able to mount using your hint of /dev/drbd1. The file system was mounted and everything

Re: [Pacemaker] kernel BUG at fs/dlm/lock.c:242! after sync of GFS2 (2 node - active/active)

2010-09-07 Thread Vladislav Bogdanov
08.09.2010 05:19, Alisson Landim wrote: After setting up a 2 node cluster following the cluster from scratch guide for Fedora 13 i have to say that GFS2 filesystem (active/active) doesn't work! If the kernel bug described below is not caused by one the modifications i HAD to do following

Re: [Pacemaker] kernel BUG at fs/dlm/lock.c:242! after sync of GFS2 (2 node - active/active)

2010-09-08 Thread Vladislav Bogdanov
08.09.2010 09:25, Alisson Landim wrote: I am updating this post with an info. After stopping the WebFS resource i could create the GFS2 filesystem on the second node so the only difference now from the Cluster from scratch guide is the: 1 - Changed /dev/drbd/by-res/wwwdata to /dev/drbd1 on

Re: [Pacemaker] kernel BUG at fs/dlm/lock.c:242! after sync of GFS2 (2 node - active/active)

2010-09-13 Thread Vladislav Bogdanov
13.09.2010 16:21, Andrew Beekhof wrote: On Wed, Sep 8, 2010 at 6:00 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote: [...] Could you paste output from drbd-overview? drbd device should be in Primary state on node where you issue mount. And... Did you start corosync or openais? Later

[Pacemaker] g_hash_table_lookup: assertion `hash_table != NULL' failed

2010-09-13 Thread Vladislav Bogdanov
Hi Andrew, hi all, Upgraded to todays tip and see bunch of assertion messages in logs (together with some segfaults from pengine). Here is what I see: Sep 13 16:53:43 s01-1 pengine: [4120]: ERROR: crm_abort: xpath_search: Triggered assert at xml.c:2599 : xml_top != NULL Sep 13 16:56:14 s01-1

[Pacemaker] chkconfig values in MCP init script (again)

2010-09-21 Thread Vladislav Bogdanov
Hi Andrew, hi all. I decided to return to this issue again because of issues with libvirt/KVM virtual domains controlled by pacemaker. libvirt package on Fedora 13 has two init scripts: libvirtd and libvirt-guests. They have following chkconfig values: libvirtd: 97 03 libvirt-guests: 98 02

Re: [Pacemaker] chkconfig values in MCP init script (again)

2010-09-22 Thread Vladislav Bogdanov
22.09.2010 11:17, Andrew Beekhof wrote: On Tue, Sep 21, 2010 at 2:24 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, hi all. I decided to return to this issue again because of issues with libvirt/KVM virtual domains controlled by pacemaker. libvirt package on Fedora 13 has

[Pacemaker] Doc build issue

2010-09-29 Thread Vladislav Bogdanov
Hi! This patch breaks rpm build and seems to be unneeded (at least on F13) Italian docs are generated without it. http://hg.clusterlabs.org/pacemaker/1.1/diff/ac25a4ecdbcb/doc/Clusters_from_Scratch/publican.cfg.in Symptoms: $ make Clusters_from_Scratch.txt Building Clusters_from_Scratch rm -rf

[Pacemaker] Dependency on either of two resources

2010-10-03 Thread Vladislav Bogdanov
Hi all, just wondering, is there a way to make resource depend on (be colocated with) either of two other resources? Use case is iSCSI initiator connection to iSCSI target with two portals. Idea is to have f.e. device manager multipath resource depend on both iSCSI connection resources, but in a

Re: [Pacemaker] Dependency on either of two resources

2010-10-05 Thread Vladislav Bogdanov
05.10.2010 12:12, Andrew Beekhof wrote: On Mon, Oct 4, 2010 at 6:31 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi all, just wondering, is there a way to make resource depend on (be colocated with) either of two other resources? Not yet. Its something we want to support

Re: [Pacemaker] stonith pacemaker problem

2010-10-11 Thread Vladislav Bogdanov
11.10.2010 09:14, Andrew Beekhof wrote: strictly speaking you don't. but at least on fedora, the policy is that $x-libs always requires $x so just building against heartbeat-libs means that yum will suck in the main heartbeat package :-( And this seem to be a bit incorrect statement btw:

Re: [Pacemaker] stonith pacemaker problem

2010-10-12 Thread Vladislav Bogdanov
12.10.2010 07:25, Andrew Beekhof wrote: On Mon, Oct 11, 2010 at 9:51 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 11.10.2010 09:14, Andrew Beekhof wrote: strictly speaking you don't. but at least on fedora, the policy is that $x-libs always requires $x so just building against

Re: [Pacemaker] Move DRBD master

2010-10-18 Thread Vladislav Bogdanov
19.10.2010 02:18, Vadym Chepkov wrote: Hi, What is the crm shell command to move drbd master to a different node? I didn't find it too. The only way to move ms resource I found is to move some other ordinary resource which is collocated with that ms (drbd) resource. But it remains unclear,

Re: [Pacemaker] Multiple independent two-node clusters side-by-side?

2010-10-27 Thread Vladislav Bogdanov
28.10.2010 03:36, Andreas Ntaflos wrote: ... Short version: How do I configure multiple independent two-node clusters where the nodes are all on the same subnet? Only the two nodes that form the cluster should see that cluster's resources and not any other. Is this possible? Where

Re: [Pacemaker] Manually controlled cluster

2010-11-04 Thread Vladislav Bogdanov
04.11.2010 13:36, Pavlos Parissis wrote: ... why do you want that? Customer request. Definitely NOT my idea. something like this could be useful location master-location ms-drbd_02 rule $id=master-rule $role=Master -1000: #uname eq node-02 -1000 is too weak s/-1000/-inf/ this will

Re: [Pacemaker] crm_mon and pingd

2010-11-05 Thread Vladislav Bogdanov
05.11.2010 19:39, Vadym Chepkov wrote: ... As Yuusuke IIDA pointed out this is a new and expected behavior of crm_mon. To be honest I don't excited about it, since -A flag fills screen with master-drbd scores and not just pingd. watch crm_mon -1A|grep -E \^(\* Node|\+ ping)\ ?

[Pacemaker] CMAN integration questions

2010-12-23 Thread Vladislav Bogdanov
Hi Andrew, It was a big surprise for me to see all pacemaker-specific bits removed from dlm and gfs2 in cluster-3.1.0, so there is currently no way to use pacemaker on f13 with dlm/gfs2/clvm but without cman. So, would you please bring some light on details of integration with cman? Especially

Re: [Pacemaker] CMAN integration questions

2010-12-23 Thread Vladislav Bogdanov
23.12.2010 14:14, Andrew Beekhof wrote: On Thu, Dec 23, 2010 at 10:41 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, It was a big surprise for me to see all pacemaker-specific bits removed from dlm and gfs2 in cluster-3.1.0, so there is currently no way to use pacemaker

Re: [Pacemaker] CMAN integration questions

2010-12-24 Thread Vladislav Bogdanov
23.12.2010 14:14, Andrew Beekhof wrote: ... Otherwise I probably need to configure huge timeouts for operations and then cluster becomes not smart. Under 'specific resources' I mean LVM VGs and LVs together with gfs2 filesystems. I currently have problems with fence domain stability

Re: [Pacemaker] Problem with Xen live migration

2011-01-18 Thread Vladislav Bogdanov
18.01.2011 14:45, Vadym Chepkov wrote: On Jan 17, 2011, at 6:44 PM, Jean-Francois Malouin wrote: Back again to setup an active/passive cluster for Xen with live migration but so far, no go. Xen DomU is shutdown and restarted when I move the Xen resource. I'm using Debian Squeeze,

[Pacemaker] crm configure load update

2011-01-18 Thread Vladislav Bogdanov
Hi all, It looks like configuration lines are pushed to running CIB line-by-line during 'crm configure load update', rather then edit-all/commit. I just observed this while pushing dozen of new primitives together with colocation/order constraints - primitives tried to start (and failed) on a

Re: [Pacemaker] Problem with Xen live migration

2011-01-18 Thread Vladislav Bogdanov
Unless clustered LVM locking is enabled and working: # sed -ri 's/^([ \t]+locking_type).*/locking_type = 3/' /etc/lvm/lvm.conf # sed -ri 's/^([ \t]+fallback_to_local_locking).*/ fallback_to_local_locking = 1/' /etc/lvm/lvm.conf # vgchange -cy VG_NAME # service clvmd start # vgs|grep

Re: [Pacemaker] Problem with Xen live migration

2011-01-18 Thread Vladislav Bogdanov
18.01.2011 15:41, Vadym Chepkov wrote: ... I have tried it myself, but concluded it's impossible to do it reliably with the current code. For the live migration to work you have to remove any colocation constraints (group included) with the Xen resource. drbd code includes a helper

Re: [Pacemaker] Problem with Xen live migration

2011-01-18 Thread Vladislav Bogdanov
18.01.2011 16:00, Vladislav Bogdanov пишет: 18.01.2011 15:41, Vadym Chepkov wrote: ... I have tried it myself, but concluded it's impossible to do it reliably with the current code. For the live migration to work you have to remove any colocation constraints (group included

Re: [Pacemaker] crm configure load update

2011-01-18 Thread Vladislav Bogdanov
18.01.2011 18:22, Dejan Muhamedagic пишет: Hi, On Tue, Jan 18, 2011 at 03:35:15PM +0200, Vladislav Bogdanov wrote: Hi all, It looks like configuration lines are pushed to running CIB line-by-line during 'crm configure load update', rather then edit-all/commit. I just observed this while

Re: [Pacemaker] crm configure load update

2011-02-02 Thread Vladislav Bogdanov
18.01.2011 19:41, Dejan Muhamedagic wrote: [...] No, it shouldn't be so, but I'm not sure. The earlier commit procedure has been quite complex (unnecessarily), and it's still sometimes in use in pacemaker 1.0.x. Is that the version you're running? No, 1.1.4. Updating to latest tip right

Re: [Pacemaker] problem with apache coming up

2011-02-09 Thread Vladislav Bogdanov
08.02.2011 17:13, Testuser SST wrote: Hi, I´m implementing a two node webserver on CentOS 5 with heartbeat/pacemaker and DRBD. The first new installed node works fine, but when the second node becomes active, there seems to be a problem with the apache starting up. Is there a way to get

Re: [Pacemaker] ifstatus OCF RA

2011-02-21 Thread Vladislav Bogdanov
Hi, 21.02.2011 22:29, Florian Haas wrote: On 02/21/2011 08:00 PM, Frederik Schüler wrote: Hello *, as the various ocf:*:ping[d] incarnations don't meet my specific needs, May I ask why and how? I was thinking about writing something similar. I need this in a quite complex networking setup,

Re: [Pacemaker] ifstatus OCF RA

2011-02-22 Thread Vladislav Bogdanov
. # # Copyright (c) 2011 Vladislav Bogdanov bub...@hoster-ok.com # Partially based on 'ping' RA by Andrew Beekhof # # OCF instance parameters: #OCF_RESKEY_name: name of attribute to set in CIB #OCF_RESKEY_iface: space separated list of network interfaces to monitor #OCF_RESKEY_bridge_ports

Re: [Pacemaker] ifstatus OCF RA

2011-02-22 Thread Vladislav Bogdanov
Hi again, attached is a bit more polished revision. Best, Vladislav 22.02.2011 15:01, Vladislav Bogdanov wrote: Hi Dejan, 22.02.2011 13:02, Dejan Muhamedagic wrote: Hi, Where can you get STP stuff from? How to interpret it? And then Please look at attached RA. I decided that today

Re: [Pacemaker] ifstatus OCF RA

2011-02-22 Thread Vladislav Bogdanov
22.02.2011 15:01, Vladislav Bogdanov wrote: Hi Dejan, 22.02.2011 13:02, Dejan Muhamedagic wrote: Hi, Where can you get STP stuff from? How to interpret it? And then Please look at attached RA. I decided that today is a good time to finally brace myself to find 5 hours to write

Re: [Pacemaker] ifstatus OCF RA

2011-02-22 Thread Vladislav Bogdanov
of network interface and records it as a value in CIB # based on summ of speeds of its active underlying interfaces. # # Copyright (c) 2011 Vladislav Bogdanov bub...@hoster-ok.com # Partially based on 'ping' RA by Andrew Beekhof # # OCF instance parameters: #OCF_RESKEY_name: name

Re: [Pacemaker] ifstatus OCF RA

2011-02-24 Thread Vladislav Bogdanov
25.02.2011 00:08, Serge Dubrouski wrote: I wonder if that would be possible to make Pacemaker to move virtual IP from one interface to another (not from one node to another) using an RA like this one. I just use STP-enabled bridge for this purpose. All ports except one are blocked by STP until

Re: [Pacemaker] ifstatus OCF RA

2011-02-24 Thread Vladislav Bogdanov
25.02.2011 00:08, Serge Dubrouski wrote: I wonder if that would be possible to make Pacemaker to move virtual IP from one interface to another (not from one node to another) using an RA like this one. I just use STP-enabled bridge for this purpose. All ports except one are blocked by STP

Re: [Pacemaker] build Issue while configurin g the cluster glue in CENT OS

2011-03-08 Thread Vladislav Bogdanov
08.03.2011 12:25, rakesh k wrote: Larry Brigman larry.brigman@... writes: Hi Larry Thanks for you suggestion the when i tried to install e2fsprogs-libs-1.39-23.el5_5.1 rpm file it says it is uptodate and again i tried to install cluster glue using make command . which thrown me the

Re: [Pacemaker] build Issue while configurin g the cluster glue in CENT OS

2011-03-08 Thread Vladislav Bogdanov
09.03.2011 07:35, rakesh k wrote: When i listed down all the packages in CentOS i found this package is already isntalled i used rpm-qa for listing down the packages Ahm, el5/centos5 have it in e2fsprogs-devel. Install that package and this should help. And there is no

Re: [Pacemaker] heartbeat vs. corosync installation confusion

2011-03-09 Thread Vladislav Bogdanov
09.03.2011 11:08, Lars Ellenberg wrote: On Wed, Mar 09, 2011 at 09:14:47AM +0100, Andrew Beekhof wrote: On Mon, Mar 7, 2011 at 4:21 PM, Dennis Jacobfeuerborn denni...@conversis.de wrote: Hi, I'm planning to setup a redundant storage system using centos5, pacemaker, corosync, drbd, nfs and

Re: [Pacemaker] ifstatus OCF RA

2011-03-19 Thread Vladislav Bogdanov
:53, Vladislav Bogdanov wrote: Hi Lars, thank you for your time and for so detailed review. Just to dot half of i's (where it is about coding style): 1. I strongly prefer to cleanly separate data access from main logic by API. 2. I prefer to have non-void functions to return result

Re: [Pacemaker] Patch for bugzilla 2541: Shell should warn if parameter uniqueness is violated

2011-03-26 Thread Vladislav Bogdanov
Oops, this is actually a bug in fence_ipmilan which reports all params as unique. 26.03.2011 08:28, Vladislav Bogdanov wrote: Hi, it seems like it was commit d0472a26eda1 which now causes following: WARNING: Resources stonith-v02-a,stonith-v02-b,stonith-v02-c,stonith-v02-d violate

Re: [Pacemaker] [Problem]Reboot by the error of the clone resource influences the resource of other nodes.

2011-04-01 Thread Vladislav Bogdanov
01.04.2011 11:10, Andrew Beekhof wrote: On Fri, Apr 1, 2011 at 9:58 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 01.04.2011 10:20, Andrew Beekhof wrote: The clone instance numbers for anonymous clones are an implementation detail and nothing should be inferred from them. Did anything

Re: [Pacemaker] Immediate fs errors on iscsi connection problem

2011-04-03 Thread Vladislav Bogdanov
Hi, You need some tuning from both sides. First, (at least some versions of) ietd needs to be blocked (-j DROP) with iptables on restarts. That means, you should block all incoming and outgoing packets (later is more important) before ietd stop and unblock all after it starts. I use home-brew

Re: [Pacemaker] [Problem]Reboot by the error of the clone resource influences the resource of other nodes.

2011-04-05 Thread Vladislav Bogdanov
. * http://developerbugs.linux-foundation.org/show_bug.cgi?id=2574 Best Regards, Hideo Yamauchi. --- On Fri, 2011/4/1, Vladislav Bogdanov bub...@hoster-ok.com wrote: 01.04.2011 11:10, Andrew Beekhof wrote: On Fri, Apr 1, 2011 at 9:58 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote

Re: [Pacemaker] A question and demand to a resource placement strategy function

2011-04-26 Thread Vladislav Bogdanov
Hi Yan, 27.04.2011 07:32, Yan Gao wrote: Hi Yuusuke, On 04/19/11 19:55, Yan Gao wrote: Actually I've been optimizing the placement-strategy lately. It will sort the resource processing order according to the priorities and scores of resources. That should result in ideal placement. Stay

Re: [Pacemaker] The different between lanplus and open for IPMI

2011-05-24 Thread Vladislav Bogdanov
Hi, 24.05.2011 11:54, Dejan Muhamedagic wrote: Hi, On Tue, May 24, 2011 at 03:31:01PM +0800, jiaju liu wrote: HI all I use ipmi as stonith resource. the interface I use lanplus first, however, it doesn't work. I check log,it shows nodea external/ipmi[3433]: ERROR: error executing

Re: [Pacemaker] The different between lanplus and open for IPMI

2011-05-24 Thread Vladislav Bogdanov
Hi, 25.05.2011 03:49, jiaju liu wrote: Hi, On Tue, May 24, 2011 at 03:31:01PM +0800, jiaju liu wrote: HI all I use ipmi as stonith resource. the interface I use lanplus first, however, it doesn't work. I check log,it shows nodea external/ipmi[3433]:

Re: [Pacemaker] Adding a STONITH device to pacemaker

2011-05-29 Thread Vladislav Bogdanov
Hi, 29.05.2011 22:38, Digimer wrote: Hi all, I'd like to adapt my Node Assassin[1] device to support Pacemaker. Currently it supports RHCS and the agent is built to the FenceAgentAPI[2]. I'm hoping it can be fairly easily adapted to Pacemaker's specs. Pacemaker has native support for

[Pacemaker] Node removal in corosync-based cluster

2011-05-29 Thread Vladislav Bogdanov
Hi all. I've got a task to remove some nodes from cluster to save some power and found that it is not sufficient to follow http://www.clusterlabs.org/doc/en-US/Pacemaker/1.1/html/Pacemaker_Explained/s-node-delete.html#s-del-ais After pacemaker is then restarted on any another remaining node,

Re: [Pacemaker] Node removal in corosync-based cluster

2011-05-30 Thread Vladislav Bogdanov
Hi, 30.05.2011 17:12, Dejan Muhamedagic wrote: Hi, On Sun, May 29, 2011 at 11:58:17PM +0300, Vladislav Bogdanov wrote: Hi all. I've got a task to remove some nodes from cluster to save some power and found that it is not sufficient to follow http://www.clusterlabs.org/doc/en-US/Pacemaker

Re: [Pacemaker] Node removal in corosync-based cluster

2011-05-31 Thread Vladislav Bogdanov
31.05.2011 12:32, Dejan Muhamedagic wrote: On Mon, May 30, 2011 at 05:27:41PM +0300, Vladislav Bogdanov wrote: Hi, 30.05.2011 17:12, Dejan Muhamedagic wrote: Hi, On Sun, May 29, 2011 at 11:58:17PM +0300, Vladislav Bogdanov wrote: Hi all. I've got a task to remove some nodes from cluster

[Pacemaker] Two stonith devices

2011-06-04 Thread Vladislav Bogdanov
Hi all, silly question: does anybody have working configuration with two stonith paths to cluster node? Background is: I have two stonith resources configured for each node in two-node cluster: primitive stonith-s01-0 stonith:external/ipmi \ params hostname=s01-0 ipaddr=10.5.4.250

Re: [Pacemaker] Two stonith devices

2011-06-07 Thread Vladislav Bogdanov
with? And, is design of stonith subsystem documented somewhere? Can you shortly describe it if it is not? Best, Vladislav 04.06.2011 12:08, Vladislav Bogdanov wrote: Hi all, silly question: does anybody have working configuration with two stonith paths to cluster node? Background is: I have two

Re: [Pacemaker] failed actions: insufficient privileges

2011-06-11 Thread Vladislav Bogdanov
11.06.2011 19:01, Alfredo Parisi wrote: Hi and thanks for the reply. I've found the problem, pacemaker haven't the privileges for create the file mysqld.sock, infact if I stop one server and create mysqld.sock with 777 and own mysql:mysql, after restart corosync, it works... but this is only

Re: [Pacemaker] How to tell pacemaker to start exportfs after filesystem resource

2011-06-21 Thread Vladislav Bogdanov
21.06.2011 17:23, Dejan Muhamedagic wrote: On Tue, Jun 21, 2011 at 06:10:16PM +0400, Aleksander Malaev wrote: How can I check this? If I don't add this exportfs resource then cluster is become the fully operational - all mounts are accesible and fail-over between nodes is working as it

Re: [Pacemaker] How to tell pacemaker to start exportfs after filesystem resource

2011-06-21 Thread Vladislav Bogdanov
of fail. That should not be a problem. portmap (rpcbind) is required for nfs to operate, but not vice-verse. Best, Vladislav. 2011/6/21 Dejan Muhamedagic deja...@fastmail.fm mailto:deja...@fastmail.fm Hi Vladislav, On Tue, Jun 21, 2011 at 05:38:21PM +0300, Vladislav Bogdanov wrote

Re: [Pacemaker] How to tell pacemaker to start exportfs after filesystem resource

2011-06-21 Thread Vladislav Bogdanov
Hi Dejan, 21.06.2011 17:46, Dejan Muhamedagic wrote: Hi Vladislav, On Tue, Jun 21, 2011 at 05:38:21PM +0300, Vladislav Bogdanov wrote: 21.06.2011 17:23, Dejan Muhamedagic wrote: On Tue, Jun 21, 2011 at 06:10:16PM +0400, Aleksander Malaev wrote: How can I check this? If I don't add

Re: [Pacemaker] [DRBD-user] Performing crm failover using crm node standby

2011-06-22 Thread Vladislav Bogdanov
Hi! 22.06.2011 14:44, Florian Haas wrote: ... This is something that has been discussed here previously. A resource agent could report to Pacemaker during monitor (via an exit code named, say, OCF_ERR_DEGRADED) that a resource, or a resource instance in the case of a clone or master/slave

[Pacemaker] Resources are not restarted on definition change after f59d7460bdde (devel)

2011-06-27 Thread Vladislav Bogdanov
Hi all, I'm pretty sure I bisected commit which breaks restart of (node local) resources after definition change. Nodes which has f59d7460bdde applied (v03-a and v03-b in my case) do not restart such resources, while node without this commit (mgmt01) does. Here is snippet from DC (grrr,

[Pacemaker] Race condition in pacemaker/lrmd cooperation right after live migration

2011-07-04 Thread Vladislav Bogdanov
Hi all, There is feeling that race condition is possible during live migration of resources. I put one node to standby mode, that made all resources migrate to another one. Virtual machines were successfully live-migrated, but then marked as FAILED almost immediately. Logs show some interesting

Re: [Pacemaker] Race condition in pacemaker/lrmd cooperation right after live migration

2011-07-08 Thread Vladislav Bogdanov
05.07.2011 10:05, Andrew Beekhof wrote: On Tue, Jul 5, 2011 at 2:37 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 05.07.2011 04:44, Andrew Beekhof wrote: Looks like the VirtualDomain RA isn't correctly implementing stop. Stop of an undefined domain shouldn't produce an error. Nope

[Pacemaker] Reload action and stop/start sequence questions

2011-07-11 Thread Vladislav Bogdanov
Hi all, Would somebody (Andrew?) please bring some light on how exactly redefinition of resource is supposed to be handled? Below is my (rather perfectionistic) vision on this, please correct me if/where I'm wrong: * If RA supports 'reload' action then it is called on resource definition change

Re: [Pacemaker] Resources are not restarted on definition change after f59d7460bdde (devel)

2011-08-03 Thread Vladislav Bogdanov
01.08.2011 02:05, Andrew Beekhof wrote: On Wed, Jul 27, 2011 at 11:46 AM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Jul 1, 2011 at 4:59 PM, Andrew Beekhof and...@beekhof.net wrote: Hmm. Interesting. I will investigate. This is an unfortunate side-effect of my history compression

Re: [Pacemaker] Reload action and stop/start sequence questions

2011-08-03 Thread Vladislav Bogdanov
27.07.2011 05:25, Andrew Beekhof wrote: ... * Dependent resources should not be stopped/started for 'reload' action. Of course they are restarted if reload fails and stop/start is executed then. (I see that they are restarted now for reload of a resource they depend on, is it a bug?) More

Re: [Pacemaker] Resources are not restarted on definition change after f59d7460bdde (devel)

2011-08-08 Thread Vladislav Bogdanov
04.08.2011 06:08, Andrew Beekhof wrote: On Wed, Aug 3, 2011 at 7:35 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 01.08.2011 02:05, Andrew Beekhof wrote: On Wed, Jul 27, 2011 at 11:46 AM, Andrew Beekhof and...@beekhof.net wrote: On Fri, Jul 1, 2011 at 4:59 PM, Andrew Beekhof

[Pacemaker] DC marks remote operation as timed out, leading to stonith

2011-08-09 Thread Vladislav Bogdanov
Hi Andrew, all, Just found little bit strange crmd behaviour - DC timed out stop operation (which lead to node fencing) by itself, without even waiting for lrmd (on another node) to finish that operation. Here is what it did: === Aug 9 07:30:07 mgmt01 crmd: [15781]: WARN:

Re: [Pacemaker] DC marks remote operation as timed out, leading to stonith

2011-08-09 Thread Vladislav Bogdanov
run. At the same time (18:17:22) pacemaker began to stop all other logical volumes resources (except config-templates-lv) and had success. 09.08.2011 11:40, Vladislav Bogdanov wrote: Hi Andrew, all, Just found little bit strange crmd behaviour - DC timed out stop operation (which lead

Re: [Pacemaker] TOTEM: Process pause detected? Leading to STONITH...

2011-08-12 Thread Vladislav Bogdanov
... I would really like someone that has these process pause problems to test a patch I have posted to see if it rectifies the situation. Our significant QE team at Red Hat doesn't see these problems and I can't generate them in engineering. It is possible your device drivers are taking

[Pacemaker] After pacemaker is stopped on DC, it gets fenced because last 'stop' operation is lost

2011-08-16 Thread Vladislav Bogdanov
Hi, I recently discovered that when I stop pacemaker on DC node (for upgrade), that node is shortly fenced by a new DC. Fencing is caused by not-stopped (from the PoV of new DC) clone instance (dlm in my case) which was actually stopped: Aug 16 08:56:29 v03-a pengine: [2083]: WARN:

Re: [Pacemaker] After pacemaker is stopped on DC, it gets fenced because last 'stop' operation is lost

2011-08-17 Thread Vladislav Bogdanov
17.08.2011 07:17, Andrew Beekhof wrote: On Tue, Aug 16, 2011 at 7:21 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi, I recently discovered that when I stop pacemaker on DC node (for upgrade), that node is shortly fenced by a new DC. Fencing is caused by not-stopped (from the PoV

Re: [Pacemaker] A question and demand to a resource placement strategy function

2011-08-22 Thread Vladislav Bogdanov
Hi Yan, 27.04.2011 08:14, Yan Gao wrote: [snip] Do priorities work for utilization strategy? Yes, the improvement works for utilization, minimal and balanced strategy: - The nodes that are more healthy and have more capacities get consumed first (globally preferred nodes). Does this still

Re: [Pacemaker] A question and demand to a resource placement strategy function

2011-08-24 Thread Vladislav Bogdanov
23.08.2011 12:19, Gao,Yan wrote: [snip] When allocating every resource, we compare the capacity of the nodes. The node has more remaining capacity is preferred. This would be quite clear if we only define one kind of capacity. While if we define multiple kinds of capacity, for example: If

[Pacemaker] pacemaker/dlm problems

2011-09-06 Thread Vladislav Bogdanov
Hi Andrew, hi all, I'm further investigating dlm lockspace hangs I described in https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html and in the thread starting from https://lists.linux-foundation.org/pipermail/openais/2011-September/016701.html . What I described there is

Re: [Pacemaker] pacemaker/dlm problems

2011-09-26 Thread Vladislav Bogdanov
Hi Andrew, 26.09.2011 10:10, Andrew Beekhof wrote: On Tue, Sep 6, 2011 at 5:27 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, hi all, I'm further investigating dlm lockspace hangs I described in https://www.redhat.com/archives/cluster-devel/2011-August/msg00133.html

Re: [Pacemaker] pacemaker/dlm problems

2011-09-26 Thread Vladislav Bogdanov
26.09.2011 11:16, Andrew Beekhof wrote: [snip] Regardless, for 1.1.6 the dlm would be better off making a call like: rc = st-cmds-fence(st, st_opts, target, reboot, 120); from fencing/admin.c That would talk directly to the fencing daemon, bypassing attrd, crnd and PE - and

Re: [Pacemaker] pacemaker/dlm problems

2011-09-27 Thread Vladislav Bogdanov
27.09.2011 08:59, Andrew Beekhof wrote: [snip] I agree with Jiaju (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html), that could be solely pacemaker problem, because it probably should originate fencing itself is such situation I think. So, using pacemaker/dlm

Re: [Pacemaker] pacemaker/dlm problems

2011-09-27 Thread Vladislav Bogdanov
27.09.2011 10:56, Andrew Beekhof wrote: On Tue, Sep 27, 2011 at 5:07 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 27.09.2011 08:59, Andrew Beekhof wrote: [snip] I agree with Jiaju (https://lists.linux-foundation.org/pipermail/openais/2011-September/016713.html), that could be solely

Re: [Pacemaker] pacemaker/dlm problems

2011-09-28 Thread Vladislav Bogdanov
Hi, 27.09.2011 10:56, Andrew Beekhof wrote: [snip] All the more reason to start using the stonith api directly. I was playing around list night with the dlm_controld.pcmk code: https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787 Doesn't seem to apply to 3.0.17,

Re: [Pacemaker] [Partially SOLVED] pacemaker/dlm problems

2011-09-28 Thread Vladislav Bogdanov
Hi Andrew, All the more reason to start using the stonith api directly. I was playing around list night with the dlm_controld.pcmk code: https://github.com/beekhof/dlm/commit/9f890a36f6844c2a0567aea0a0e29cc47b01b787 Doesn't seem to apply to 3.0.17, so I rebased that commit against it

Re: [Pacemaker] pacemaker/dlm problems

2011-10-02 Thread Vladislav Bogdanov
03.10.2011 04:41, Andrew Beekhof wrote: [...] If pacemaker fully finish processing of one membership change - elect new DC on a quorate partition, and do not try to take over dc role (or release it) on a non-quorate partition if quorate one exists, that problem could be gone. Non quorate

Re: [Pacemaker] pacemaker/dlm problems

2011-10-03 Thread Vladislav Bogdanov
03.10.2011 10:56, Andrew Beekhof wrote: On Mon, Oct 3, 2011 at 3:34 PM, Vladislav Bogdanovbub...@hoster-ok.com wrote: 03.10.2011 04:41, Andrew Beekhof wrote: [...] If pacemaker fully finish processing of one membership change - elect new DC on a quorate partition, and do not try to take over

Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-10-03 Thread Vladislav Bogdanov
Hi, 29.09.2011 17:47, Nick Khamis wrote: Hello Dejan, Sorry to hijack, I am also working on the same type of setup as a prototype. What is the best way to get stonith included for VM setups? Maybe an SSH stonith? Again, this is just for the prototype. You may look at fence-virt. I use

Re: [Pacemaker] Dual-Primary DRBD with OCFS2 on SLES 11 SP1

2011-10-04 Thread Vladislav Bogdanov
hosts will list that vm's in qpid, and fence_virt is not smart enough to deal with that. Vladislav Nick. On Mon, Oct 3, 2011 at 3:39 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi, 29.09.2011 17:47, Nick Khamis wrote: Hello Dejan, Sorry to hijack, I am also working on the same

Re: [Pacemaker] pcmk + corosync + cman for dlm support?

2011-10-28 Thread Vladislav Bogdanov
28.10.2011 04:04, Nick Khamis wrote: Hello Everyone, I just want to make sure this is still the case before I go through with it. I am trying to setup an active/active using: Corosync 1.4.2 Pacemaker 1.1.6 Cluster3 DRBD 8.3.7 OCFS2 The only reason I installed Cluster3 was for dlm

Re: [Pacemaker] [Linux-HA] pcmk + corosync + cman for dlm support?

2011-11-02 Thread Vladislav Bogdanov
02.11.2011 16:36, Nick Khamis wrote: Vladislav, Thank you so much for your response. Just to make sure, all I need is to: * Apply the three patches to cman. Found here http://www.gossamer-threads.com/lists/linuxha/pacemaker/75164?do=post_view_threaded;. * Recompile CMAN * Do I have to

Re: [Pacemaker] [Linux-HA] pcmk + corosync + cman for dlm support?

2011-11-03 Thread Vladislav Bogdanov
03.11.2011 15:37, Nick Khamis wrote: Hello Vlad, Thank you so much for your response. I am experiencing the same hang as well. Did you have better luck with GFS2, or any other network file system? If you see almost simultaneous kernel panic on all cluster nodes, then you probably hit the

Re: [Pacemaker] [Partially SOLVED] pacemaker/dlm problems

2011-11-14 Thread Vladislav Bogdanov
be you remember? Best, Vladislav 28.09.2011 17:41, Vladislav Bogdanov wrote: Hi Andrew, All the more reason to start using the stonith api directly. I was playing around list night with the dlm_controld.pcmk code: https://github.com/beekhof/dlm/commit

Re: [Pacemaker] [Partially SOLVED] pacemaker/dlm problems

2011-11-14 Thread Vladislav Bogdanov
to zero if node have been seen after if was fenced and then appeared again? 14.11.2011 23:36, Vladislav Bogdanov wrote: Hi Andrew, I just found another problem with dlm_controld.pcmk (with your latest patch from github applied and also my fixes to actually build it - they are included

[Pacemaker] Reload does not work with current github tree (2d8fad5)

2011-11-23 Thread Vladislav Bogdanov
Hi Andrew, all Just noticed that reload action does not happen when resource definition change: Nov 23 08:16:08 v03-a pengine: [2091]: CRIT: check_action_definition: Parameters to c5-x64-devel.vds-ok.com-vm_monitor_1 on v03-b changed: recorded 94f8fd587de8d9dd8454443 cbde11b4e vs.

Re: [Pacemaker] Reload does not work with current github tree (2d8fad5)

2011-11-23 Thread Vladislav Bogdanov
Hi Andreas, 23.11.2011 13:13, Andreas Kurz wrote: On 11/23/2011 09:30 AM, Vladislav Bogdanov wrote: Hi Andrew, all Just noticed that reload action does not happen when resource definition change: Nov 23 08:16:08 v03-a pengine: [2091]: CRIT: check_action_definition: Parameters to c5-x64

Re: [Pacemaker] Reload does not work with current github tree (2d8fad5)

2011-11-23 Thread Vladislav Bogdanov
24.11.2011 07:37, Andrew Beekhof wrote: On Wed, Nov 23, 2011 at 7:30 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, all Just noticed that reload action does not happen when resource definition change: Nov 23 08:16:08 v03-a pengine: [2091]: CRIT: check_action_definition

Re: [Pacemaker] [Partially SOLVED] pacemaker/dlm problems

2011-11-23 Thread Vladislav Bogdanov
24.11.2011 08:49, Andrew Beekhof wrote: On Thu, Nov 24, 2011 at 3:58 PM, Vladislav Bogdanov bub...@hoster-ok.com wrote: 24.11.2011 07:33, Andrew Beekhof wrote: On Tue, Nov 15, 2011 at 7:36 AM, Vladislav Bogdanov bub...@hoster-ok.com wrote: Hi Andrew, I just found another problem

Re: [Pacemaker] Fencing libvirt/KVM nodes running on different hosts?

2011-11-28 Thread Vladislav Bogdanov
28.11.2011 22:55, Andreas Ntaflos wrote: Hi, Scenario: two physical virtualisation hosts run various KVM-based virtual machines, managed by Libvirt. Two VMs, one on each host, form a Pacemaker cluster, say for a simple database server, using DRBD and a virtual/cluster IP address. Using

Re: [Pacemaker] CLVM Pacemaker Corosync on Ubuntu Omeiric Server

2011-11-30 Thread Vladislav Bogdanov
30.11.2011 14:08, Vadim Bulst wrote: Hello, first of all I'd like to ask you a general question: Does somebody successfully set up a clvm cluster with pacemaker and run it in productive mode? I will say yes after I finally resolve remaining dlmfencing issues. Now back to the concrete

Re: [Pacemaker] CLVM Pacemaker Corosync on Ubuntu Omeiric Server

2011-11-30 Thread Vladislav Bogdanov
over accidentally found one, but it does its function for me. Set 'avoid_lck' to 1. Best, Vladislav Am 30.11.2011 13:10, schrieb Vadim Bulst: Am 30.11.2011 12:22, schrieb Vladislav Bogdanov: 30.11.2011 14:08, Vadim Bulst wrote: Hello, first of all I'd like to ask you a general question

[Pacemaker] Excessive migrate_from is run after migrate_to failed

2011-12-01 Thread Vladislav Bogdanov
Hi Andrew, all, I found that pacemaker runs migrate_from on a migration destination node even if preceding migrate_to command failed (github master). Is it intentional? hb_report? Best, Vladislav ___ Pacemaker mailing list:

  1   2   3   4   >