All, I have recently faced with strange behavior of the CRM. I have OCF compliant RA ( ocf-tester considers it to be such ) . It is supposed to fail over to other node on the very first failure and it does. I noticed that my resource has fail counter set to INFINITY on node from which it has failed over. And crm_mon report it's start as a failed action. I have looked into logs and found that for some reason "start" was called again after "monitor" returned OCF_ERR_GENERIC and "stop" has successfully executed. I supposed that after "monitor" returns error HA will call "stop" once. After that the PE should re-calculate scores according to fail counter value and stickinesses and so on... and decide what action should be done. Instead "start" was called on failed resource and resource was fenced. I had to run crm_resource -C manually to allow my resource run again on this node.
Could anybody suggest me how I could debug this to find out what's going on? TIA Alex T _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
