On Wed, Jun 25, 2008 at 14:57, Serge Dubrouski <[EMAIL PROTECTED]> wrote: > On Wed, Jun 25, 2008 at 6:15 AM, Serge Dubrouski <[EMAIL PROTECTED]> wrote: >> On Wed, Jun 25, 2008 at 5:29 AM, Dominik Klein <[EMAIL PROTECTED]> wrote: >>> Junko IKEDA wrote: >>>>>>>> >>>>>>>> Unfortunately, the latest package produced the same results. >>>>>>>> pgsql couldn't fail over using crm_resource -F. >>>>>>> >>>>>>> I think you perhaps misunderstand what -F does... it is intended to >>>>>>> tell the cluster that the resource failed. >>>>>>> Although it may move as well (depending on how you set up the scores), >>>>>>> this is not the primary goal. >>>>>> >>>>>> pgsql is set as, moves to the other node if it fails. >>>>>> If crm_resrouce -F is called, pgsql's fail-count would be increased from >>>> >>>> 0 >>>>>> >>>>>> to 1, >>>>>> so pgsql should move to the appropriate node. >>>>>> but pgsql was just stopped, and not moved. >>>>>> Other resources were still running. >>>>> >>>>> Ah ok, sorry just wanted to make sure the intended functionality was >>>> >>>> clear. >>>>> >>>>> I had a look at the report and analysis.txt highlights the problem quite >>>> >>>> well: >>>>> >>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Hard error: >>>>> prmApPostgreSQLDB_fail_60000 failed with rc=2. >>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Preventing >>>>> prmApPostgreSQLDB from re-starting anywhere in the cluster >>>>> >>>>> It looks like the RA (incorrectly) returned 2 (invalid parameter), >>>>> instead of 3 (unimplemented function). >>>>> rc=2 tells the cluster that the configuration is invalid and not to >>>>> bother starting the resource elsewhere. >>>> >>>> !!! that means, there might be a problem at pgsql RA? >>>> >>>> Thanks, >>>> Junko >>>> >>>> >>> >>> http://hg.linux-ha.org/dev/file/42ce605e3da5/resources/OCF/pgsql >>> >>> Look at the end of the script. >>> >>> If it is invoked in any other way, it calls usage which exits OCF_ERR_ARGS >>> (ie 2). See how it was called. This should be the reason. >>> >>> I wonder how this could pass ocf-tester. It does not support any of the >>> notify operations nor validate-all nor meta-data. >>> >>> Or am I looking at the wrong file? >> >> You are looking at the right file, and I submitted a patch for this >> problem a couple of weeks ago. >> > And here is one more patch that fixes the problem. Also I have a > couple of questions: > > 1. What is 'fail' operation is supposed to do?
"fail" :-) > 2. Why '-F' option isn't described in the help message for crm_resource an oversight i guess _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
