On Wed, Aug 5, 2015 at 4:04 PM, Jehan-Guillaume de Rorthais <j...@dalibo.com> wrote: > hi guys, > > We are still on our new postgresql resource agent. > > We kind of make our minds with the promotion issue (see ml thread "problem > with > master score limited to 1000000") and found an acceptable algorithm. > > Now we are testing this RA, I found a strange behavior of the CRM with a > simple > failure scenario: The master resource is stopped. > > When I stop gracefully the master,
You mean - stop postgres outside of pacemaker? > the CRM tries to recover > the resource > with : > > * demote it > * stop it > * start it > * promote it > > Sounds logic, but it fails at the first step because the master is actually > stopped. According to the "ra-dev-guide", the RA should returns > OCF_ERR_GENERIC > if the resource is stopped on demote. See: > > http://www.linux-ha.org/doc/dev-guides/_literal_demote_literal_action.html > > When teaching my RA to follow this, the CRM keep trying the same transition > again and again until the failcount reaches the migration-threshold. Then it > stops trying to recover it and moves the resource to another node. > > Same result if the RA returns OCF_NOT_RUNNING from the demote action instead > of > OCF_ERR_GENERIC. > > I could try to obey the CRM and start the resource as a slave and > return OCF_SUCCESS, but it sounds ridiculous as it will be stopped at the > really next step, then start again one step later... > > Did I missed something? Is this behavior normal? Any advise to fix this? > > Regards, > -- > Jehan-Guillaume de Rorthais > Dalibo > http://www.dalibo.com > > _______________________________________________ > Developers mailing list > Developers@clusterlabs.org > http://clusterlabs.org/mailman/listinfo/developers _______________________________________________ Developers mailing list Developers@clusterlabs.org http://clusterlabs.org/mailman/listinfo/developers