Bjorn Oglefjorn wrote:
> Thanks for the reply Dejan. My responses are inline.
> --BO
>
> On 3/28/07, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
>>
>> On Wed, Mar 28, 2007 at 11:29:35AM -0400, Bjorn Oglefjorn wrote:
>> > I believe I've corrected some issues, but now I'm getting more of this:
>> > Mar 28 11:02:37 test-1 lrmd: [22008]: ERROR: RA lsb:httpd:monitor
>> (process
>> > 24472) failed to redirect stdout for its background child (daemon)
>> > processes. This will likely cause those processes to die
>> mysteriously at
>> > some later time (terminated by signal SIGPIPE).
>>
>> Hmm, I think that this has been addressed as Alan had already
>> pointed out, probably after the 2.0.7 release. If you can, please
>> upgrade to 2.0.8.
>
>
> I'd prefer to stick with the package that comes from CentOS extras (2.0.7).
> I don't get this error all the time, so I'm not sure why it's happening.
> Can someone give me a deeper explanation of what the lrmd doesn't like
> here?
>
>> When I attempt to move resources to another node (useing crm_standby) I
>> get
>> > these errors:
>> > Mar 28 10:56:04 test-1 crmd: [22011]: info:
>> do_lrm_rsc_op:lrm.cPerforming
>> > op stop on httpd (interval=0ms,
>> key=28:66532759-6190-4321-9be3-07730b15aeae)
>> > Mar 28 10:56:04 test-1 lrmd: [22773]: WARN: For LSB init script, no
>> > additional parameters are needed.
>>
>> Can't say unless you show me this rsc definition, but it seems
>> like bad usage. I found one below, but that one should not cause
>> this problem:
>
>
> It's slightly different now (is provider="heartbeat" bad here?):
>
> <primitive class="lsb" id="httpd" provider="heartbeat"
> type="httpd-lsb">
> <operations>
> <op id="httpd_mon" interval="5s" name="monitor" timeout="20s"
> on_fail="restart"/>
> <op id="httpd_start" name="start" timeout="20s"
> on_fail="restart" prereq="fencing"/>
> <op id="httpd_stop" name="stop" timeout="20s" on_fail="restart"
> prereq="fencing"/>
> </operations>
> </primitive>
>
>> <primitive class="lsb" id="httpd" provider="heartbeat" type="httpd">
>> > <operations>
>> > <op id="httpd_status" interval="5s" name="status" timeout="20s"
>> on_fail="fence"/>
>> > </operations>
>> > </primitive>
>>
>> One thing that looks odd is 5s interval and 20s timeout. The
>> timeout is probably OK, but the interval is a bit exaggerated.
>> What I mean is that, apart from putting extra strain on your host
>> which may or may not be an issue, a 5 seconds monitoring interval
>> won't bring you much, or, in other words, how about your response
>> time in case a problem occurs? Is it of the same order?
>
>
> Would it make more sense to have the timeout and interval equal? I can see
> your point.
>
>> Mar 28 10:56:04 test-1 lrmd: [22008]: ERROR: RA lsb:httpd:stop (process
>> > 22773) failed to redirect stdout for its background child (daemon)
>> > processes. This will likely cause those processes to die
>> mysteriously at
>> > some later time (terminated by signal SIGPIPE).
>> > Mar 28 10:56:04 test-1 lrmd: [22008]: info: RA output:
>> (httpd:stop:stdout)
>> > httpd (pid 22165 22164 22163 22162 22161 22160 22159 22157 22155) is
>> > running...
>> > Mar 28 10:56:04 test-1 crmd: [22011]: WARN: process_lrm_event:lrm.c LRM
>> > operation (44) stop_0 on httpd Error: (1) unknown error
>>
>> I'd strongly recommend that you use the OCF RA in stead of your
>> distributions init script. It is otherwise rather difficult to
>> figure out what this error means apart from the fact that the stop
>> op failed. I wonder why did it show up as WARN and not ERROR.
I agree. Also, our resource agent monitors apache much better than
status on the LSB init script.
--
Alan Robertson <[EMAIL PROTECTED]>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems