hi guys, We are still on our new postgresql resource agent.
We kind of make our minds with the promotion issue (see ml thread "problem with master score limited to 1000000") and found an acceptable algorithm. Now we are testing this RA, I found a strange behavior of the CRM with a simple failure scenario: The master resource is stopped. When I stop gracefully the master, the CRM tries to recover the resource with : * demote it * stop it * start it * promote it Sounds logic, but it fails at the first step because the master is actually stopped. According to the "ra-dev-guide", the RA should returns OCF_ERR_GENERIC if the resource is stopped on demote. See: http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html When teaching my RA to follow this, the CRM keep trying the same transition again and again until the failcount reaches the migration-threshold. Then it stops trying to recover it and moves the resource to another node. Same result if the RA returns OCF_NOT_RUNNING from the demote action instead of OCF_ERR_GENERIC. I could try to obey the CRM and start the resource as a slave and return OCF_SUCCESS, but it sounds ridiculous as it will be stopped at the really next step, then start again one step later... Did I missed something? Is this behavior normal? Any advise to fix this? Regards, -- Jehan-Guillaume de Rorthais Dalibo http://www.dalibo.com _______________________________________________ Developers mailing list Developers@clusterlabs.org http://clusterlabs.org/mailman/listinfo/developers