On Mon, Oct 20, 2008 at 02:56:40PM +0200, Stefan Ott wrote:
> Simon Horman wrote:
>> On Fri, Oct 10, 2008 at 05:55:02PM +0200, Stefan Ott wrote:
>>> Simon Horman wrote:
>>>> On Fri, Oct 10, 2008 at 09:40:11AM +1100, Simon Horman wrote:
>>>>> Hi Stefan,
>>>>>
>>>>> I think that there is a silly parsing bug. Can you please try:
>>>>>
>>>>> checkcommand = /usr/local/sbin/check_lustre_on_realserver
>>>>>
>>>>> Instead of
>>>>>
>>>>> checkcommand = "/usr/local/sbin/check_lustre_on_realserver"
>>>> Hi Stefan,
>>>>
>>>> could you try the following patch to see if it solves your
>>>> problem without needing to update the configuration file?
>>>>
>>>> Thanks
>>>>
>>> Hi Simon
>>>
>>> I tried both (removing the quotes and applying your patch), none of
>>> which helped. Any other ideas?
>>
>> Hi Stefan,
>>
>> sorry for taking a while to look into this. You do need to make
>> the change above or apply the patch above. But there is also another
>> change needed.
>>
>> The signal handling has been set up to auto-reap children,
>> which is needed for the case where they time out and thus
>> aren't reaped by the waitpid() that is called inside system().
>>
>> However due to the wonders of perl, setting autoreap actually
>> changes the return value of waitpid() from > 0 to -1. This
>> breaks system() and is the root cause of the problem you are seeing.
>>
>> My proposed fix is below. It sets up a signal handler that
>> just reaps the childern as neccessary. And it does this globally,
>> as there seems to be no good reason not to.
>>
>> An alternative, which is just a work-around, is to simply remove
>> the "local $SIG{CHLD} = 'IGNORE';" which appears around line 2290.
>> This, however, will lead to zombies of your check process ever
>> times out.
Thanks, I've merged it into the dev tree.
--
Simon Horman
VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
H: www.vergenet.net/~horms/ W: www.valinux.co.jp/en
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems