On Wed, Jun 25, 2008 at 14:57, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> On Wed, Jun 25, 2008 at 6:15 AM, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
>> On Wed, Jun 25, 2008 at 5:29 AM, Dominik Klein <[EMAIL PROTECTED]> wrote:
>>> Junko IKEDA wrote:
>>>>>>>>
>>>>>>>> Unfortunately, the latest package produced the same results.
>>>>>>>> pgsql couldn't fail over using crm_resource -F.
>>>>>>>
>>>>>>> I think you perhaps misunderstand what -F does... it is intended to
>>>>>>> tell the cluster that the resource failed.
>>>>>>> Although it may move as well (depending on how you set up the scores),
>>>>>>> this is not the primary goal.
>>>>>>
>>>>>> pgsql is set as, moves to the other node if it fails.
>>>>>> If crm_resrouce -F is called, pgsql's fail-count would be increased from
>>>>
>>>> 0
>>>>>>
>>>>>> to 1,
>>>>>> so pgsql should move to the appropriate node.
>>>>>> but pgsql was just stopped, and not moved.
>>>>>> Other resources were still running.
>>>>>
>>>>> Ah ok, sorry just wanted to make sure the intended functionality was
>>>>
>>>> clear.
>>>>>
>>>>> I had a look at the report and analysis.txt highlights the problem quite
>>>>
>>>> well:
>>>>>
>>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Hard error:
>>>>> prmApPostgreSQLDB_fail_60000 failed with rc=2.
>>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op:   Preventing
>>>>> prmApPostgreSQLDB from re-starting anywhere in the cluster
>>>>>
>>>>> It looks like the RA (incorrectly) returned 2 (invalid parameter),
>>>>> instead of 3 (unimplemented function).
>>>>> rc=2 tells the cluster that the configuration is invalid and not to
>>>>> bother starting the resource elsewhere.
>>>>
>>>> !!! that means, there might be a problem at pgsql RA?
>>>>
>>>> Thanks,
>>>> Junko
>>>>
>>>>
>>>
>>> http://hg.linux-ha.org/dev/file/42ce605e3da5/resources/OCF/pgsql
>>>
>>> Look at the end of the script.
>>>
>>> If it is invoked in any other way, it calls usage which exits OCF_ERR_ARGS
>>> (ie 2). See how it was called. This should be the reason.
>>>
>>> I wonder how this could pass ocf-tester. It does not support any of the
>>> notify operations nor validate-all nor meta-data.
>>>
>>> Or am I looking at the wrong file?
>>
>> You are looking at the right file, and I submitted a patch for this
>> problem a couple of weeks ago.
>>
> And here is one more patch that fixes the problem. Also I have a
> couple of questions:
>
> 1. What is 'fail' operation is supposed to do?

"fail" :-)

> 2. Why '-F' option isn't described in the help message for crm_resource

an oversight i guess
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to