Re: [Linux-HA] the node cannot be a standby when its clone peer is unmanaged

Andrew Beekhof Wed, 03 Sep 2008 00:30:40 -0700

On Wed, Sep 3, 2008 at 07:57, Junko IKEDA <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I found the following clone behavior.
> Clone resource were set as globally_unique="false".
>
> First, there are one group resource (non_clone_group1) and clone group
> resource (clone1) on node-a.
> non_clone_group1 and clone1 has resource collocation like this;
> <rsc_order id="order_resource1_clone1" from="clone1" action="start"
> type="before" to="non_clone_group1" score="0"/>
>
> # crm_mon -i1
> Refresh in 1s...
>
> ============
> Last updated: Wed Sep  3 13:57:55 2008
> Current DC: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3)
> 2 Nodes configured.
> 2 Resources configured.
> ============
>
> Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
> Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): online
>
> Resource Group: non_clone_group1
>    group1-dummy1       (ocf::heartbeat:Dummy1):        Started node-a
>    group1-dummy2       (ocf::heartbeat:Dummy2):        Started node-a
> Clone Set: clone1
>    Resource Group: clone_group1:0
>        clone1-dummy3:0 (ocf::heartbeat:Dummy3):        Started node-b
>        clone1-dummy4:0 (ocf::heartbeat:Dummy4):        Started node-b
>    Resource Group: clone_group1:1
>        clone1-dummy3:1 (ocf::heartbeat:Dummy3):        Started node-a
>        clone1-dummy4:1 (ocf::heartbeat:Dummy4):        Started node-a
>
> by the way,
> Dummy1, Dummy2, Dummy3, Dummy4 are the copy of Dummy RA.
> just renamed.
>
> And, I modified Dummy1 to fail when it stopped.
> So Dummy1 would fail if node-a is set as standby.
> It worked as I expected.
>
>
> Refresh in 1s...
>
> ============
> Last updated: Wed Sep  3 13:58:26 2008
> Current DC: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3)
> 2 Nodes configured.
> 2 Resources configured.
> ============
>
> Node: node-b (59295d90-5459-490d-a1e0-d48810cf2fb3): online
> Node: node-a (b3852a23-c10b-440a-a8e0-263b0185d657): standby
>
> Resource Group: non_clone_group1
>    group1-dummy1       (ocf::heartbeat:Dummy1):        Started node-a
> (unmanaged) FAILED
>    group1-dummy2       (ocf::heartbeat:Dummy2):        Stopped
> Clone Set: clone1
>    Resource Group: clone_group1:0
>        clone1-dummy3:0 (ocf::heartbeat:Dummy3):        Started node-b
>        clone1-dummy4:0 (ocf::heartbeat:Dummy4):        Started node-b
>    Resource Group: clone_group1:1
>        clone1-dummy3:1 (ocf::heartbeat:Dummy3):        Started node-a
>        clone1-dummy4:1 (ocf::heartbeat:Dummy4):        Started node-a
>
> Failed actions:
>    group1-dummy1_stop_0 (node=node-a, call=17, rc=1): complete
>
>
> After that, I set node-b to standby, too.
> node-b became standby node, but it's resource couldn't be stopped, keep
> running.
> it seems that ha-log says like "the stop operation for group1-dummy1 is
> pending on node-a, so the transition can not go forward. "
> group1-dummy1 didn't run on node-b, so there shouldn't be a problem if
> node-b's resource stop without group1-dummy1.
> I know that "unmanaged" status is really dangerous, so if I face it, I
> should run "crm_resource -C" to rescue it.
> but it would be better if the peer node can stop its clone resource in the
> case of like this.


I agree in principle, however ordering constraints are, in theory,
independent of location.
Which means that a constraint like "stop groupA before cloneB" doesn't
imply that it is safe to stop any instances of B not on the same node
as A at any time.

We do have the "interleave" hint for clones which approximates the
behavior you're looking for sometimes, but its not as smart as it
could be and only works between two clone (or master/slave) resources.

So for now this is "as designed", but I'd encourage you to log an
enhancement for it in bugzilla (with the attached hb_report)
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] the node cannot be a standby when its clone peer is unmanaged

Reply via email to