Re: [Linux-HA] sometimes crm_resource -F fails

Dejan Muhamedagic Wed, 25 Jun 2008 08:31:19 -0700

On Wed, Jun 25, 2008 at 04:56:17PM +0200, Andrew Beekhof wrote:
> On Wed, Jun 25, 2008 at 14:57, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> > On Wed, Jun 25, 2008 at 6:15 AM, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> >> On Wed, Jun 25, 2008 at 5:29 AM, Dominik Klein <[EMAIL PROTECTED]> wrote:
> >>> Junko IKEDA wrote:
> >>>>>>>>
> >>>>>>>> Unfortunately, the latest package produced the same results.
> >>>>>>>> pgsql couldn't fail over using crm_resource -F.
> >>>>>>>
> >>>>>>> I think you perhaps misunderstand what -F does... it is intended to
> >>>>>>> tell the cluster that the resource failed.
> >>>>>>> Although it may move as well (depending on how you set up the scores),
> >>>>>>> this is not the primary goal.
> >>>>>>
> >>>>>> pgsql is set as, moves to the other node if it fails.
> >>>>>> If crm_resrouce -F is called, pgsql's fail-count would be increased 
> >>>>>> from
> >>>>
> >>>> 0
> >>>>>>
> >>>>>> to 1,
> >>>>>> so pgsql should move to the appropriate node.
> >>>>>> but pgsql was just stopped, and not moved.
> >>>>>> Other resources were still running.
> >>>>>
> >>>>> Ah ok, sorry just wanted to make sure the intended functionality was
> >>>>
> >>>> clear.
> >>>>>
> >>>>> I had a look at the report and analysis.txt highlights the problem quite
> >>>>
> >>>> well:
> >>>>>
> >>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Hard error:
> >>>>> prmApPostgreSQLDB_fail_60000 failed with rc=2.
> >>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op:   Preventing
> >>>>> prmApPostgreSQLDB from re-starting anywhere in the cluster
> >>>>>
> >>>>> It looks like the RA (incorrectly) returned 2 (invalid parameter),
> >>>>> instead of 3 (unimplemented function).
> >>>>> rc=2 tells the cluster that the configuration is invalid and not to
> >>>>> bother starting the resource elsewhere.
> >>>>
> >>>> !!! that means, there might be a problem at pgsql RA?
> >>>>
> >>>> Thanks,
> >>>> Junko
> >>>>
> >>>>
> >>>
> >>> http://hg.linux-ha.org/dev/file/42ce605e3da5/resources/OCF/pgsql
> >>>
> >>> Look at the end of the script.
> >>>
> >>> If it is invoked in any other way, it calls usage which exits OCF_ERR_ARGS
> >>> (ie 2). See how it was called. This should be the reason.
> >>>
> >>> I wonder how this could pass ocf-tester. It does not support any of the
> >>> notify operations nor validate-all nor meta-data.
> >>>
> >>> Or am I looking at the wrong file?
> >>
> >> You are looking at the right file, and I submitted a patch for this
> >> problem a couple of weeks ago.
> >>
> > And here is one more patch that fixes the problem.


I don't think that there's a need for an RA to support the fail
action.

> > Also I have a
> > couple of questions:
> >
> > 1. What is 'fail' operation is supposed to do?
> 
> "fail" :-)

A typical use case should be when one wants to inform the cluster
that the resource failed in an asynchronous manner.

Thanks,

Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] sometimes crm_resource -F fails

Reply via email to