On Jun 25, 2008, at 5:30 PM, Dejan Muhamedagic wrote:
On Wed, Jun 25, 2008 at 04:56:17PM +0200, Andrew Beekhof wrote:
On Wed, Jun 25, 2008 at 14:57, Serge Dubrouski <[EMAIL PROTECTED]>
wrote:
On Wed, Jun 25, 2008 at 6:15 AM, Serge Dubrouski
<[EMAIL PROTECTED]> wrote:
On Wed, Jun 25, 2008 at 5:29 AM, Dominik Klein <[EMAIL PROTECTED]
telegence.net> wrote:
Junko IKEDA wrote:
Unfortunately, the latest package produced the same results.
pgsql couldn't fail over using crm_resource -F.
I think you perhaps misunderstand what -F does... it is
intended to
tell the cluster that the resource failed.
Although it may move as well (depending on how you set up
the scores),
this is not the primary goal.
pgsql is set as, moves to the other node if it fails.
If crm_resrouce -F is called, pgsql's fail-count would be
increased from
0
to 1,
so pgsql should move to the appropriate node.
but pgsql was just stopped, and not moved.
Other resources were still running.
Ah ok, sorry just wanted to make sure the intended
functionality was
clear.
I had a look at the report and analysis.txt highlights the
problem quite
well:
pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Hard
error:
prmApPostgreSQLDB_fail_60000 failed with rc=2.
pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op:
Preventing
prmApPostgreSQLDB from re-starting anywhere in the cluster
It looks like the RA (incorrectly) returned 2 (invalid
parameter),
instead of 3 (unimplemented function).
rc=2 tells the cluster that the configuration is invalid and
not to
bother starting the resource elsewhere.
!!! that means, there might be a problem at pgsql RA?
Thanks,
Junko
http://hg.linux-ha.org/dev/file/42ce605e3da5/resources/OCF/pgsql
Look at the end of the script.
If it is invoked in any other way, it calls usage which exits
OCF_ERR_ARGS
(ie 2). See how it was called. This should be the reason.
I wonder how this could pass ocf-tester. It does not support any
of the
notify operations nor validate-all nor meta-data.
Or am I looking at the wrong file?
You are looking at the right file, and I submitted a patch for this
problem a couple of weeks ago.
And here is one more patch that fixes the problem.
I don't think that there's a need for an RA to support the fail
action.
Correct, but it should tell us when there is an action it doesn't
support :)
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems