On Fri, Apr 13, 2012 at 11:47 AM, Andrew Beekhof <[email protected]> wrote:
> On Thu, Apr 12, 2012 at 5:26 PM, Lars Ellenberg
> <[email protected]> wrote:
>> On Wed, Apr 11, 2012 at 08:22:59AM +1000, Andrew Beekhof wrote:
>>> It looks like the drbd RA is calling crm_master during the monitor action.
>>> That wouldn't seem like a good idea as the value isn't counted until
>>> the resource is started and if the transition is interrupted (as it is
>>> here) then the PE won't try to promote it (because the value didn't
>>> change).
>>
>> I did not get the last part.
>> Why would it not be promoted,
>> even though it has positive master score?
>
> Because we don't know that we need to run the PE again - because the
> only changes in the PE were things we expected.
>
> See:
> https://github.com/beekhof/pacemaker/commit/65f1a22a4b66581159d8b747dbd49fa5e2ef34e1
>
> This "only" becomes and issue when the transition is interrupted
> between the non-recurring monitor and the start, which I guess was
> rare enough that we hadn't noticed it for 4 years :-(
>
>>
>>> Has the drbd RA always done this?
>>
>> Yes.
>>
>> When else should we call crm_master?
>
> I guess the only situation you shouldn't is during a non-recurring
> monitor if you're about to return 7.
> Which I'll concede isn't exactly obvious.
I'm thinking about applying this, which restricts the previous patch
to cases when the state of the resource is unknown.
Existing regression tests appear to pass while enabling the expected
behavior here.
Can anyone see something wrong with it?
diff --git a/pengine/master.c b/pengine/master.c
index 7af1936..77a82e6 100644
--- a/pengine/master.c
+++ b/pengine/master.c
@@ -410,14 +410,18 @@ master_score(resource_t * rsc, node_t * node,
int not_set_value)
return score;
}
- if (rsc->fns->state(rsc, TRUE) < RSC_ROLE_STARTED) {
- return score;
- }
+ if (node == NULL) {
+ if(rsc->fns->state(rsc, TRUE) < RSC_ROLE_STARTED) {
+ crm_trace("Ingoring master score for %s: unknown state on %s",
+ rsc->id, node->details->uname);
+ return score;
+ }
- if (node != NULL) {
+ } else {
node_t *match = pe_find_node_id(rsc->running_on, node->details->id);
+ node_t *known = pe_hash_table_lookup(rsc->known_on, node->details->id);
- if (match == NULL) {
+ if (match == NULL && known == NULL) {
crm_trace("%s is not active on %s - ignoring", rsc->id,
node->details->uname);
return score;
}
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems