Re: [Linux-HA] Failover on monitor failure.

Dominik Klein Fri, 14 Nov 2008 01:20:52 -0800

HA Version 2.1.4, CRM Version 2.0 (CIB feature set 2.0) node:aa909246edb386137b986c5773344b98c6969999


Okay, so you should not hit the group bug I was referring to.

I think that may be where the disconnect is happening. I ran ptest -Lsas you suggested and found that for my service, the score on the primarynode is 0 and the score on the secondary node is -1000000. Could thatbe because I am also setting the default-resource-stickiness to-INFINITY, as I interpreted the FAQ to suggest?
It is very strange, actually; I appear to have two sets of scores.ptest -Ls output gives me two sets of values appended to each other.Here is my simplification of the output exactly as it appears:
group_1 allocation score on secondary: 0
group_1 allocation score on primary: 0
IP address allocation score on secondary: 0
IP address allocation score on primary: 100
DRBD allocation score on secondary: 0
DRBD allocation score on primary: 0
Filesystem allocation score on secondary: 0
Filesystem allocation score on primary: 0
PostgreSQL allocation score on secondary: 0
PostgreSQL allocation score on primary: 0
Kamailio (my custom service) allocation score on secondary: 0
Kamailio (my custom service) allocation score on primary: 0
IP address allocation score on secondary: 0
IP address allocation score on primary: 100
DRBD allocation score on secondary: -1000000
DRBD allocation score on primary: 0
Filesystem allocation score on secondary: -1000000
Filesystem allocation score on primary: 0
PostgreSQL allocation score on secondary: -1000000
PostgreSQL allocation score on primary: 0
Kamailio (my custom service) allocation score on secondary: -1000000
Kamailio (my custom service) allocation score on primary: 0

This and your config pasted below is not from the same cluster or notfrom the same config. The names are not identical and your values dontmatch.


So get that right.

If this is a test cluster. Go like this:

# stop heartbeat
/etc/init.d/heartbeat stop
# wait until all nodes shut down
# remove any old xml config
rm /var/lib/heartbeat/crm/*
# then start all nodes again
/etc/init.d/heartbeat start

It never hurts to attach your entire config, maybe something else iscausing this.
cibadmin -Ql > my.xml
OK. Here is something else very strange; when I look at my cib.xml, Isee the -INFINITY values I've assigned for default-resource-stickinessand default-resource-failure-stickiness. But when I look at cibadmin-Q, I see them as being set to 0. What gives?
cib.xml: http://pastebin.com/f610c9a0a

<nvpair id="cib-bootstrap-options-default-resource-stickiness"name="default-resource-stickiness" value="-INFINITY"/>

Thats propably not what you want. That would mean every successfullystarted resource would receive a score of -infinity and therefore bestopped again as with a negative score, a resource is not allowed to runon a node. That would end up in a stop/start I imagine (although didnttry :).

cibadmin -Q: http://pastebin.com/f21748793

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Failover on monitor failure.

Reply via email to