On Wed, Jun 25, 2008 at 04:56:17PM +0200, Andrew Beekhof wrote: > On Wed, Jun 25, 2008 at 14:57, Serge Dubrouski <[EMAIL PROTECTED]> wrote: > > On Wed, Jun 25, 2008 at 6:15 AM, Serge Dubrouski <[EMAIL PROTECTED]> wrote: > >> On Wed, Jun 25, 2008 at 5:29 AM, Dominik Klein <[EMAIL PROTECTED]> wrote: > >>> Junko IKEDA wrote: > >>>>>>>> > >>>>>>>> Unfortunately, the latest package produced the same results. > >>>>>>>> pgsql couldn't fail over using crm_resource -F. > >>>>>>> > >>>>>>> I think you perhaps misunderstand what -F does... it is intended to > >>>>>>> tell the cluster that the resource failed. > >>>>>>> Although it may move as well (depending on how you set up the scores), > >>>>>>> this is not the primary goal. > >>>>>> > >>>>>> pgsql is set as, moves to the other node if it fails. > >>>>>> If crm_resrouce -F is called, pgsql's fail-count would be increased > >>>>>> from > >>>> > >>>> 0 > >>>>>> > >>>>>> to 1, > >>>>>> so pgsql should move to the appropriate node. > >>>>>> but pgsql was just stopped, and not moved. > >>>>>> Other resources were still running. > >>>>> > >>>>> Ah ok, sorry just wanted to make sure the intended functionality was > >>>> > >>>> clear. > >>>>> > >>>>> I had a look at the report and analysis.txt highlights the problem quite > >>>> > >>>> well: > >>>>> > >>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Hard error: > >>>>> prmApPostgreSQLDB_fail_60000 failed with rc=2. > >>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Preventing > >>>>> prmApPostgreSQLDB from re-starting anywhere in the cluster > >>>>> > >>>>> It looks like the RA (incorrectly) returned 2 (invalid parameter), > >>>>> instead of 3 (unimplemented function). > >>>>> rc=2 tells the cluster that the configuration is invalid and not to > >>>>> bother starting the resource elsewhere. > >>>> > >>>> !!! that means, there might be a problem at pgsql RA? > >>>> > >>>> Thanks, > >>>> Junko > >>>> > >>>> > >>> > >>> http://hg.linux-ha.org/dev/file/42ce605e3da5/resources/OCF/pgsql > >>> > >>> Look at the end of the script. > >>> > >>> If it is invoked in any other way, it calls usage which exits OCF_ERR_ARGS > >>> (ie 2). See how it was called. This should be the reason. > >>> > >>> I wonder how this could pass ocf-tester. It does not support any of the > >>> notify operations nor validate-all nor meta-data. > >>> > >>> Or am I looking at the wrong file? > >> > >> You are looking at the right file, and I submitted a patch for this > >> problem a couple of weeks ago. > >> > > And here is one more patch that fixes the problem.
I don't think that there's a need for an RA to support the fail action. > > Also I have a > > couple of questions: > > > > 1. What is 'fail' operation is supposed to do? > > "fail" :-) A typical use case should be when one wants to inform the cluster that the resource failed in an asynchronous manner. Thanks, Dejan _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
