Doug Knight wrote:
> On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote:
>> Doug Knight wrote:
>> > Current 2.0.8 tarball from 1/18/07. Process in top looks like:
>> >
>> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> > 24591 root 18 0 1663m 1.5g 1028 R 83 77.8 1:19.42
>> > /usr/sbin/crm_master -v 100
>> >
>> > It dies and restarts about every 120 seconds, which happens to be the
>> > timeout I have specified for the stop and start methods.
>> >
>> > Doug
>> >
>> > On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote:
>> >> Doug Knight wrote:
>> >> > Hi Alan,
>> >> > I've started testing my OCF script, and I'm seeing something unusual
>> >> > during initial startup. I've placed a crm_master call in my
>> >> > stateful_start function, after the function has determined that it is
>> >> > running on what should be the master, and postgresql has successfully
>> >> > started:
>> >> >
>> >> > crm_master -v 100
>> >> >
>> >> > When this command gets executed, it starts using nearly 100% CPU, memory
>> >> > usage continuously increases up to about 68%, then it dies (killed via
>> >> > timeout?), followed by a second attempt to go master (with the same
>> >> > charactistics, after the function timeout is exceeded), then a demote is
>> >> > sent (again, after timeout) and it switches to try to become the slave
>> >> > (crm_master -v 10 is what I use, though I'm not sure this is correct
>> >> > usage to say "I want to change to a slave). Eventually, I wind up with
>> >> > the resource in failed mode.
>> >> >
>> >> > First question, any idea why the straight line running of a crm_master
>> >> > -v 100 (not within any loops in my script) would spin up to 100%?
>> >>
>> >> Bugs maybe? What version of heartbeat are you running? Which processes
>> >> are running up to 100%? For how long?
>> >>
>> >> > Second question, is using the crm_master -v with different values the
>> >> > way to say on which node I prefer the master to run (higher number =
>> >> > preferred node)?
>> >>
>> >> Yes. I believe that these are added into the values that come from
>> >> other constraints in your configuration file to come up with a best
>> >> configuration.
>>
>> Good info.
>>
>> Could you provide a few hundred lines of strace output to show us what
>> it's doing?
>>
>
> Do you mean the last few hundred lines from ha.log? Just the primary
> where I'm trying to start?
No, I mean output from the strace command. From your reply, I'd guess
you've never used it:
strace -tt -p process-id-of-hung-process > /some/file
Do that for a few seconds, and attach the file to an email to the list.
Does that help?
--
Alan Robertson <[EMAIL PROTECTED]>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/