On Friday 14 September 2007, Dominik Klein wrote: > >> Yes I meant the resource is running first and crashes later on, so that > >> monitor reports "not running". > > > > generally, one shouldn't report "not running" in such cases > > Okay, maybe I should have read this more precisely. > http://www.linux-ha.org/OCFResourceAgent > "monitor - monitor the health of a resource. Exit 0 if the resource is > running, 7 if it is stopped and anything else if it is failed" > > Okay. Good to know. But how can I (my RA) know wether Linux-HA expects > my resource to run or not to run when it calls the monitor script? > Iirc it calls "monitor" on probe and on monitor action. Is there a way > to determine what it expects to get? Because the way I understand it > now, I have to return OCF_NOT_RUNNING in case "monitor" is called by > probe and the resource is not runnning and return OCF_ERR_GENERIC (or > some other non-0 and non-7 value) if "monitor" is called by monitor and > the resource is not running. The RA is assumed to be dumb...
You have two choices: a) you keep track if it was started/stopped (lock file for example) and when the resource "dissapeared" you return ERROR b) you don't care about start/stop and return either not-running and in case it is running but you are able to detect that something is wrong you return ERROR. In both cases heartbeat handles it the same way because the CRM keeps track if the resource was succesfuly started and returns "not-running" even if no stop operation was called. In case a) you just have to make sure your lock file is deleted by the stop operation and in case the node is shutdown on the hard way (poweroff etc.) As far as i remember heartbeat (>2.1.0) provides a kind of lockfile framework for RA. Use this and you will do fine. Check the existing RAs and the files they include. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
