On Wed, Jun 25, 2008 at 6:15 AM, Serge Dubrouski <[EMAIL PROTECTED]> wrote:
> On Wed, Jun 25, 2008 at 5:29 AM, Dominik Klein <[EMAIL PROTECTED]> wrote:
>> Junko IKEDA wrote:
>>>>>>>
>>>>>>> Unfortunately, the latest package produced the same results.
>>>>>>> pgsql couldn't fail over using crm_resource -F.
>>>>>>
>>>>>> I think you perhaps misunderstand what -F does... it is intended to
>>>>>> tell the cluster that the resource failed.
>>>>>> Although it may move as well (depending on how you set up the scores),
>>>>>> this is not the primary goal.
>>>>>
>>>>> pgsql is set as, moves to the other node if it fails.
>>>>> If crm_resrouce -F is called, pgsql's fail-count would be increased from
>>>
>>> 0
>>>>>
>>>>> to 1,
>>>>> so pgsql should move to the appropriate node.
>>>>> but pgsql was just stopped, and not moved.
>>>>> Other resources were still running.
>>>>
>>>> Ah ok, sorry just wanted to make sure the intended functionality was
>>>
>>> clear.
>>>>
>>>> I had a look at the report and analysis.txt highlights the problem quite
>>>
>>> well:
>>>>
>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op: Hard error:
>>>> prmApPostgreSQLDB_fail_60000 failed with rc=2.
>>>> pengine[20727]: 2008/06/23_11:02:40 ERROR: unpack_rsc_op:   Preventing
>>>> prmApPostgreSQLDB from re-starting anywhere in the cluster
>>>>
>>>> It looks like the RA (incorrectly) returned 2 (invalid parameter),
>>>> instead of 3 (unimplemented function).
>>>> rc=2 tells the cluster that the configuration is invalid and not to
>>>> bother starting the resource elsewhere.
>>>
>>> !!! that means, there might be a problem at pgsql RA?
>>>
>>> Thanks,
>>> Junko
>>>
>>>
>>
>> http://hg.linux-ha.org/dev/file/42ce605e3da5/resources/OCF/pgsql
>>
>> Look at the end of the script.
>>
>> If it is invoked in any other way, it calls usage which exits OCF_ERR_ARGS
>> (ie 2). See how it was called. This should be the reason.
>>
>> I wonder how this could pass ocf-tester. It does not support any of the
>> notify operations nor validate-all nor meta-data.
>>
>> Or am I looking at the wrong file?
>
> You are looking at the right file, and I submitted a patch for this
> problem a couple of weeks ago.
>
And here is one more patch that fixes the problem. Also I have a
couple of questions:

1. What is 'fail' operation is supposed to do?
2. Why '-F' option isn't described in the help message for crm_resource

-- 
Serge Dubrouski.
--- resources/OCF/pgsql.dist	2008-06-25 08:53:12.000000000 -0400
+++ resources/OCF/pgsql	2008-06-25 08:50:06.000000000 -0400
@@ -418,7 +418,7 @@
 
     stop)       pgsql_stop
                 exit $?;;
-    notify|promote|demote)
+    notify|promote|demote|fail)
                 exit $OCF_ERR_UNIMPLEMENTED;;
 esac
 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to