On Thursday 13 September 2007, Dominik Klein wrote: > > http://www.linux-ha.org/v2/faq > > > > That's were you'll find one example of calculating scores. Check > > also the list archives---quite a few times this issue came up. > > Of course I read that. > > What I want to know is: Does the cluster make a difference between > monitor result "error" and monitor result "not running". From what I > saw, it looked like the decision was different. > > error: stickiness is still applied, then calculated > > not running: stickiness value is removed from current scores, then a > decision is made where to run the resource and *then* the stickiness > value is applied again. > > That's what I wanted to know and I did not come across an answer yet. In general: - the score calculation does not care about errors/not running. All it cares about is the resource fail-count for a node.
The score for a resource R on a Node N is: score(R) = resource_stickiness(R) + (failcount(R,N) * failure_stickiness(R)) + constraint_score(R,N) where resource_stickiness(R) = 0 for all nodes where R is not running. So I bet your next question is - when is the fail-count increased ;-) And here it is (for version 2.1.1): a) * precondition: R was running on N, which means following happened: - R was probed on N as not running - R was started on N - R start operation was called and returned OK * if this precondition is met heartbeat does not distinguish between a monitor return of "not running" or "error". This means if monitor returns "not running" without(!) "resource stop" = "error" --> failcount for R on N increases by 1 b) * condition: failed start - which means: - R was probed on N as not running - R was started on N - R start returned ERROR (not sure if monitor is called just after start) * regardless what monitor reports (error/non-running) fail-count does not go up because it never really started (this behaviour is currently under discussion - AFAIK) NOTE: be aware that heartbeat marks the score for a failed resource start on this node as -INFINITY until you clear the node/resource with "crm_resource -C -r resource -H node" ---> since -INFINITY overrules all scores the failcount of R on N doesn't really mater anymore. And here a hint ... The best way is always to try it out - use a Dummy resource and setup the configuration - you are faster with testing than with writing the email ---- worst of all those things change over time and most developers do know only the "current state of the art" (which does not help you if you use version x.y.z). You have to test it out anyway with the real cluster. kindr regards Max _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
