Re: [Linux-HA] When the failed node power on again, it will take over the resource from the active one?

Dan Frincu Fri, 03 Dec 2010 03:19:31 -0800

Hi Bin,

Bin Chen(sunwen_ling) wrote:
> Hi Dan,
>
> Thanks for your reply. For me I didn't setup any complex rules, I just set
> up a two nodes cluster, and configure a dummy resource to view the behavior.
> I am wondering this behavior is by default if I didn't specify anything(no
> resource constraints, no location constraints)? Why heartbeat wants to
> design like this because when M1 stop/start the resource will be migrated
> twice, which seems not necessary. I mean if I set "I prefer resource to be
> running at M1" I can understand the behavior, but if I didn't set anything
> it should treat M1 and M2 the same machine, right?
>   
When you don't explicitly set anything on a resource, it takes the 
default allocation score

# ptest -Ls
Allocation scores:
native_color: test-binch allocation score on cluster1: 0
native_color: test-binch allocation score on cluster2: 0

The question actually is, when having a resource with equal score on 
both nodes, how does Pacemaker decide where to allocate that resource? 
The answer is: it depends.

If you have just one resource, it will choose the node with the lowest 
uname when compared by strcmp()

I've replicated your setup, but by using corosync, with a dummy resource 
and it works the way you've described it.

1. 2 nodes, cluster1 and cluster2 and a resource Dummy
2. cluster 1 and cluster 2 start, Dummy starts on cluster1
3. echo b > /proc/sysrq-trigger on cluster1, resource moves to cluster2
4. cluster1 comes back, resource moves to cluster1

This is the expected behavior, and if you're still wondering why it does 
that, think of the actions and the order they occur in.

First both nodes start, then you allocate a resource to the cluster 
without specifying a location constraint or score. It automagically sets 
the score to zero, but it also has to decide a location, being just one 
resource, it chooses the node with the lowest uname (cluster1). Cluster1 
disappears, lowest uname node remaining is cluster2, resource moves. 
Cluster1 comes back, lowest uname is now cluster1, resource moves back.

I've also done another test.

1. Shutdown corosync on cluster1 (Dummy moves to cluster2)
2. On cluster2, delete node cluster1
3. On cluster1, permanently change hostname to cluster3 (echo 
"kernel.hostname = cluster3" >> /etc/sysctl.conf && sysctl -p)
4. Start corosync on cluster3 (Dummy stays on cluster2, because it has 
the lowest uname, score is still 0 on both nodes for resource Dummy)
5. echo b > /proc/sysrq-trigger on cluster2 (resource Dummy moves to 
cluster3, lowest uname)
6. Cluster2 restarts (Dummy resource moves back to cluster2, lowest uname)

So when it comes to one equal cost resource, the tiebreaker is the 
lowest uname when compared by strcmp() and it preempts the location (by 
comparison, the designated coordinator, DC, is not preempted, which is a 
good thing, but that's another discussion).

What happens when you have 2 resources of equal cost? Pacemaker tries to 
spread the resources "evenly" across the available nodes.

# ptest -Ls
Allocation scores:
native_color: test-binch allocation score on cluster2: 0
native_color: test-binch allocation score on cluster3: 0
native_color: test-binch-2 allocation score on cluster2: 0
native_color: test-binch-2 allocation score on cluster3: 0

r...@cluster2:~# ptest -LsVVV 2>&1 | grep Leave
ptest[6460]: 2010/12/03_13:13:47 notice: LogActions: Leave resource 
test-binch  (Started cluster2)
ptest[6460]: 2010/12/03_13:13:47 notice: LogActions: Leave resource 
test-binch-2        (Started cluster3)

HTH,
Dan

p.s.: imagine that I've said score wherever I said cost, it's an old 
habit I've developed from saying "equal cost {routes,load balancing}" 
too many times.
> Looking forward to your reply.
>
> Thanks.
> Bin
>
> The CIB:
>
> <?xml version="1.0" ?>
> <cib admin_epoch="0" crm_feature_set="3.0.1"
> dc-uuid="d111371b-51bd-41f0-a764-4e2f7616e47a" epoch="10" have-quorum="1"
> num_updates="313" validate-with="pacemaker-1.0">
>   <configuration>
>     <crm_config>
>       <cluster_property_set id="cib-bootstrap-options">
>         <nvpair id="cib-bootstrap-options-dc-version" name="dc-version"
> value="1.0.10-da7075976b5ff0bee71074385f8fd02f296ec8a3"/>
>         <nvpair id="cib-bootstrap-options-cluster-infrastructure"
> name="cluster-infrastructure" value="Heartbeat"/>
>         <nvpair id="cib-bootstrap-options-stonith-enabled"
> name="stonith-enabled" value="false"/>
>       </cluster_property_set>
>     </crm_config>
>     <rsc_defaults/>
>     <op_defaults/>
>     <nodes>
>       <node id="d111371b-51bd-41f0-a764-4e2f7616e47a" type="normal"
> uname="xcp-3"/>
>       <node id="51fbafc2-2ca9-4123-b1e0-43927f6eccb6" type="normal"
> uname="xcp-1"/>
>     </nodes>
>     <resources>
>       <primitive class="ocf" id="test-binch" provider="heartbeat"
> type="binch">
>         <operations>
>           <op id="test-binch-monitor-3s" interval="3s" name="monitor"/>
>         </operations>
>       </primitive>
>     </resources>
>     <constraints/>
>   </configuration>
> </cib>
>
>
> On Thu, Dec 2, 2010 at 8:03 PM, Dan Frincu <[email protected]> wrote:
>
>   
>> Hi,
>>
>> Bin Chen(sunwen_ling) wrote:
>>     
>>> Hi guys,
>>>
>>> I have configured 2 machines, M1 and M2. The case is:
>>>
>>> 1) M1 starts, M2 starts, resource running on M1
>>> 2) M1 poweroff, resource running on M2
>>> 3) M1 poweron, resource migrated to M1 from M2
>>>
>>> In step 3, for me I want to leave the resource running at M2, just make
>>>       
>> the
>>     
>>> M1 to be passive node. How to achieve that?
>>>
>>>       
>> Set the default resource-stickiness or the individual resource
>> stickiness to a value higher than the location constraint. If the
>> resource is part of a group, set the resource-stickiness to a value
>> higher than the cumulated score of the group.
>>
>> Regards,
>> Dan
>>     
>>> Thanks.
>>> Bin
>>> _______________________________________________
>>> Linux-HA mailing list
>>> [email protected]
>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>>> See also: http://linux-ha.org/ReportingProblems
>>>
>>>       
>> --
>> Dan FRINCU
>> Systems Engineer
>> CCNA, RHCE
>> Streamwide Romania
>>
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
>>     
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>   

-- 
Dan FRINCU
Systems Engineer
CCNA, RHCE
Streamwide Romania

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] When the failed node power on again, it will take over the resource from the active one?

Reply via email to