Re: [Pacemaker] Unique clone instance is stopped too early on move

Vladislav Bogdanov Thu, 16 Apr 2015 23:37:17 -0700

17.04.2015 00:48, Andrew Beekhof wrote:

On 22 Jan 2015, at 12:04 am, Vladislav Bogdanov <bub...@hoster-ok.com> wrote:

20.01.2015 02:44, Andrew Beekhof wrote:

On 16 Jan 2015, at 3:59 pm, Vladislav Bogdanov <bub...@hoster-ok.com> wrote:

16.01.2015 07:44, Andrew Beekhof wrote:

On 15 Jan 2015, at 3:11 pm, Vladislav Bogdanov <bub...@hoster-ok.com> wrote:

13.01.2015 11:32, Andrei Borzenkov wrote:

On Tue, Jan 13, 2015 at 10:20 AM, Vladislav Bogdanov
<bub...@hoster-ok.com> wrote:

Hi Andrew, David, all.

I found a little bit strange operation ordering during transition execution.

Could you please look at the following partial configuration (crmsh syntax)?

===
...
clone cl-broker broker \
         meta interleave=true target-role=Started
clone cl-broker-vips broker-vips \
         meta clone-node-max=2 globally-unique=true interleave=true 
resource-stickiness=0 target-role=Started
clone cl-ctdb ctdb \
         meta interleave=true target-role=Started
colocation broker-vips-with-broker inf: cl-broker-vips cl-broker
colocation broker-with-ctdb inf: cl-broker cl-ctdb
order broker-after-ctdb inf: cl-ctdb cl-broker
order broker-vips-after-broker 0: cl-broker cl-broker-vips
...
===

After I put one node to standby and then back to online, I see the following 
transition (relevant excerpt):

===
  * Pseudo action:   cl-broker-vips_stop_0
  * Resource action: broker-vips:1   stop on c-pa-0
  * Pseudo action:   cl-broker-vips_stopped_0
  * Pseudo action:   cl-ctdb_start_0
  * Resource action: ctdb            start on c-pa-1
  * Pseudo action:   cl-ctdb_running_0
  * Pseudo action:   cl-broker_start_0
  * Resource action: ctdb            monitor=10000 on c-pa-1
  * Resource action: broker          start on c-pa-1
  * Pseudo action:   cl-broker_running_0
  * Pseudo action:   cl-broker-vips_start_0
  * Resource action: broker          monitor=10000 on c-pa-1
  * Resource action: broker-vips:1   start on c-pa-1
  * Pseudo action:   cl-broker-vips_running_0
  * Resource action: broker-vips:1   monitor=30000 on c-pa-1
===

What could be a reason to stop unique clone instance so early for move?


Do not take it as definitive answer, but cl-broker-vips cannot run
unless both other resources are started. So if you compute closure of
all required transitions it looks rather logical. Having
cl-broker-vips started while broker is still stopped would violate
constraint.


Problem is that broker-vips:1 is stopped on one (source) node unnecessarily 
early.


It looks to be moving from c-pa-0 to c-pa-1
It might be unnecessarily early, but it is what you asked for... we have to 
unwind the resource stack before we can build it up.


Yes, I understand that it is valid, but could its stop be delayed until cluster 
is in the state when all dependencies are satisfied to start it on another node 
(like migration?)?


No, because "we have to unwind the resource stack before we can build it up."
Doing anything else would be one of those things that is trivial for a human to 
identify but rather complex for a computer.


I believe there is also an issue with migration of clone instances.

I modified pe-input to allow migration of cl-broker-vips (and also set inf 
score for broker-vips-after-broker
and make cl-broker-vips interleaved).
Relevant part is:
clone cl-broker broker \
        meta interleave=true target-role=Started
clone cl-broker-vips broker-vips \
        meta clone-node-max=2 globally-unique=true interleave=true 
allow-migrate=true resource-stickiness=0 target-role=Started
clone cl-ctdb ctdb \
        meta interleave=true target-role=Started
colocation broker-vips-with-broker inf: cl-broker-vips cl-broker
colocation broker-with-ctdb inf: cl-broker cl-ctdb
order broker-after-ctdb inf: cl-ctdb cl-broker
order broker-vips-after-broker inf: cl-broker cl-broker-vips

After that (part of) transition is:

* Resource action: broker-vips:1   migrate_to on c-pa-0
* Pseudo action:   cl-broker-vips_stop_0
* Resource action: broker-vips:1   migrate_from on c-pa-1
* Resource action: broker-vips:1   stop on c-pa-0
* Pseudo action:   cl-broker-vips_stopped_0
* Pseudo action:   all_stopped
* Pseudo action:   cl-ctdb_start_0
* Resource action: ctdb            start on c-pa-1
* Pseudo action:   cl-ctdb_running_0
* Pseudo action:   cl-broker_start_0
* Resource action: ctdb            monitor=10000 on c-pa-1
* Resource action: broker          start on c-pa-1
* Pseudo action:   cl-broker_running_0
* Pseudo action:   cl-broker-vips_start_0
* Resource action: broker          monitor=10000 on c-pa-1
* Pseudo action:   broker-vips:1_start_0
* Pseudo action:   cl-broker-vips_running_0
* Resource action: broker-vips:1   monitor=30000 on c-pa-1

But, I would say that at least from a human logic PoV the above breaks ordering 
rule broker-vips-after-broker
(cl-broker-vips finished migrating and thus runs on c-pa-1 before cl-broker 
started there).
Technically broker-vips:1_start_0 goes at the right position, but actually resource is 
"started"
in migrate_to/mifrate_from.


I also went further and injected a pair of non-clone IPAddr2 resources into the 
same pe-input, and also enabled migration
for them (returning interleave for cl-broker-vips to false and setting ordering 
score for broker-vips-after-broker back to 0,
so all three order constraints are adjacent):

clone cl-broker broker \
        meta interleave=true target-role=Started
clone cl-broker-vips broker-vips \
        meta clone-node-max=2 globally-unique=true interleave=false 
allow-migrate=true resource-stickiness=0 target-role=Started
clone cl-ctdb ctdb \
        meta interleave=true target-role=Started
primitive broker-vip1 IPaddr2 \
        params ip=192.168.122.70 cidr_netmask=24 nic=eth0 \
        op start interval=0 timeout=20 \
        op stop interval=0 timeout=20 \
        op monitor interval=30
primitive broker-vip2 IPaddr2 \
        params ip=192.168.122.71 cidr_netmask=24 nic=eth0 \
        op start interval=0 timeout=20 \
        op stop interval=0 timeout=20 \
        op monitor interval=30
colocation broker-with-ctdb inf: cl-broker cl-ctdb
colocation broker-vips-with-broker inf: cl-broker-vips cl-broker
colocation broker-vip1-with-broker inf: broker-vip1 cl-broker
colocation broker-vip2-with-broker inf: broker-vip2 cl-broker
colocation broker-vip2-not-with-vip1 -100: broker-vip2 broker-vip1
order broker-after-ctdb inf: cl-ctdb cl-broker
order broker-vips-after-broker 0: cl-broker cl-broker-vips
order broker-vip1-after-broker 0: cl-broker broker-vip1
order broker-vip2-after-broker 0: cl-broker broker-vip2

For broker-vip2 I see completely different output (compare with broker-vips:1):

* Resource action: broker-vips:1   migrate_to on c-pa-0


I just noticed this, since when does IPaddr2 migrate?

I just injected allow_migrate for broker-vip1, broker-vip2 andbroker-vips into the pe_input to test what would pengine do but forgotto note that (actually cl-broker-vips definition above has it enabledbut broker-vip{1,2} misses that, damn, my fault, it should be theretoo). I need to be more accurate.For g-u clone it doesn't solve the issue btw. But for ordinary resourceit does. That makes me think that migration paths differ for g-u cloneinstances.Actually, implementing (pseudo-)migration in IPaddr2 doesn't seem to bevery complex task.


Reason I noticed is because broker-vips definitely doesn’t start until the end 
anymore:

  * Resource action: broker          start on c-pa-1
  * Pseudo action:   cl-broker_running_0
  * Pseudo action:   cl-broker-vips_start_0
  * Resource action: broker          monitor=10000 on c-pa-1
  * Resource action: broker-vips:1   start on c-pa-1

Actually it is migrated at the very beginning of the transition, andthat seems to be a big issue to me, because it breaks ordering (startbecame a pseudo-action, but actual work should be done in migrate_fromwhich is run before broker start).




_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org



_______________________________________________
Pacemaker mailing list: Pacemaker@oss.clusterlabs.org
http://oss.clusterlabs.org/mailman/listinfo/pacemaker

Project Home: http://www.clusterlabs.org
Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf
Bugs: http://bugs.clusterlabs.org

Re: [Pacemaker] Unique clone instance is stopped too early on move

Reply via email to