On 5/9/06, Peter Kruse <[EMAIL PROTECTED]> wrote:
Hello,

it seems that in 2.0.5 the attribute rsc_state to lrm_rsc_op has
disappeared. And has been replaced by rc_code and op_status.
But it is not the same.  In order to remove errors in the
cib, so that resources are started again, or nodes can take over
again, I used to do something like this:
Search in "<lrm_rsc_op" for "rsc_state=start_failed" or
"rsc_state=monitor_failed" and clear it with:

crm_resource -C -r $resname -t primitive -H $hostname

But this is not the same now, as the return code can be
non-zero as a result of a probe action, which in this
case does not mean error.  So how can I find out which
resources I can safely delete from the lrm without
causing much harm?

rc_code and op_status long ago took over from rsc_state
actually i'm not sure rsc_state was ever actually used by the CRM for
anything except logging

This is from http://www.linux-ha.org/v2/dtd1.0/annotated :
rsc_state is the state of the resource after the action completed and
should be used as a guide only.

The problem being that it was often wrong because it knew nothing of
return codes either.

if you want a list of failed resources: crm_mon -1 | grep failed

if you just want the lrm_rsc_op's that failed, look for rc_code != 0
&& rc_code != 7 (where 7 is LSB for "Safely Stopped") in the result of
cibadmin -Ql -o status
_______________________________________________________
Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to