On Wed, Jun 25, 2008 at 8:56 AM, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > On Wed, Jun 25, 2008 at 14:57, Serge Dubrouski <[EMAIL PROTECTED]> wrote: >> On Wed, Jun 25, 2008 at 6:15 AM, Serge Dubrouski <[EMAIL PROTECTED]> wrote: >>> On Wed, Jun 25, 2008 at 5:29 AM, Dominik Klein <[EMAIL PROTECTED]> wrote: >>>> Junko IKEDA wrote: >>>>>>>>> >>>>>>>>> Unfortunately, the latest package produced the same results. >>>>>>>>> pgsql couldn't fail over using crm_resource -F. >>>>>>>> >>>>>>>> I think you perhaps misunderstand what -F does... it is intended to >>>>>>>> tell the cluster that the resource failed. >>>>>>>> Although it may move as well (depending on how you set up the scores), >>>>>>>> this is not the primary goal. >>>>>>> >>>>>>> pgsql is set as, moves to the other node if it fails. >>>>>>> If crm_resrouce -F is called, pgsql's fail-count would be increased from >>>>> >>>>> 0 >>>>>>> >>>>>>> to 1, >>>>>>> so pgsql should move to the appropriate node. >>>>>>> but pgsql was just stopped, and not moved. >>>>>>> Other resources were still running. >>>>>> >>>>>> Ah ok, sorry just wanted to make sure the intended functionality was >>>>> >>>>> clear. >>>>>> >>>>>> I had a look at the report and analysis.txt highlights the problem quite >>>>> >>>>> well: >>>>>> >>>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Hard error: >>>>>> prmApPostgreSQLDB_fail_60000 failed with rc=2. >>>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Preventing >>>>>> prmApPostgreSQLDB from re-starting anywhere in the cluster >>>>>> >>>>>> It looks like the RA (incorrectly) returned 2 (invalid parameter), >>>>>> instead of 3 (unimplemented function). >>>>>> rc=2 tells the cluster that the configuration is invalid and not to >>>>>> bother starting the resource elsewhere. >>>>> >>>>> !!! that means, there might be a problem at pgsql RA? >>>>> >>>>> Thanks, >>>>> Junko >>>>> >>>>> >>>> >>>> http://hg.linux-ha.org/dev/file/42ce605e3da5/resources/OCF/pgsql >>>> >>>> Look at the end of the script. >>>> >>>> If it is invoked in any other way, it calls usage which exits OCF_ERR_ARGS >>>> (ie 2). See how it was called. This should be the reason. >>>> >>>> I wonder how this could pass ocf-tester. It does not support any of the >>>> notify operations nor validate-all nor meta-data. >>>> >>>> Or am I looking at the wrong file? >>> >>> You are looking at the right file, and I submitted a patch for this >>> problem a couple of weeks ago. >>> >> And here is one more patch that fixes the problem. Also I have a >> couple of questions: >> >> 1. What is 'fail' operation is supposed to do? > > "fail" :-)
That is to broad an explanation :-) I just wonder what would be the best implementation for fail action in RA. In this "fixed" version pgsql just reports "NOT_IMPLEMENTED", crm increases fail_count and if score still allows to keep a resource on a current node nothing else happens. I suspect that one would expect a resource to be moved from the current node when "crm_resource -F" is called, but I don't know how to correctly implement that on a RA level. May be the best way would if CRM not just incrased failcount but set it to a value high enough for failing a resource over to another node? In this case RA would just stop a resource when it's called with "fail" action. > >> 2. Why '-F' option isn't described in the help message for crm_resource > > an oversight i guess > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Serge Dubrouski. _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
