Doug Knight wrote:
> On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote:
>> Doug Knight wrote:
>> > Current 2.0.8 tarball from 1/18/07. Process in top looks like:
>> > 
>> >   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM  TIME+   COMMAND
>> > 24591 root  18   0 1663m 1.5g 1028 R   83 77.8  1:19.42
>> > /usr/sbin/crm_master -v 100
>> > 
>> > It dies and restarts about every 120 seconds, which happens to be the
>> > timeout I have specified for the stop and start methods.
>> > 
>> > Doug
>> > 
>> > On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote:
>> >> Doug Knight wrote:
>> >> > Hi Alan,
>> >> > I've started testing my OCF script, and I'm seeing something unusual
>> >> > during initial startup. I've placed a crm_master call in my
>> >> > stateful_start function, after the function has determined that it is
>> >> > running on what should be the master, and postgresql has successfully
>> >> > started:
>> >> > 
>> >> > crm_master -v 100
>> >> > 
>> >> > When this command gets executed, it starts using nearly 100% CPU, memory
>> >> > usage continuously increases up to about 68%, then it dies (killed via
>> >> > timeout?), followed by a second attempt to go master (with the same
>> >> > charactistics, after the function timeout is exceeded), then a demote is
>> >> > sent (again, after timeout) and it switches to try to become the slave
>> >> > (crm_master -v 10 is what I use, though I'm not sure this is correct
>> >> > usage to say "I want to change to a slave). Eventually, I wind up with
>> >> > the resource in failed mode.
>> >> > 
>> >> > First question, any idea why the straight line running of a crm_master
>> >> > -v 100 (not within any loops in my script) would spin up to 100%?
>> >>
>> >> Bugs maybe?  What version of heartbeat are you running?  Which processes
>> >> are running up to 100%?  For how long?
>> >>
>> >> > Second question, is using the crm_master -v with different values the
>> >> > way to say on which node I prefer the master to run (higher number =
>> >> > preferred node)?
>> >>
>> >> Yes.  I believe that these are added into the values that come from
>> >> other constraints in your configuration file to come up with a best
>> >> configuration.
>>
>> Good info.
>>
>> Could you provide a few hundred lines of strace output to show us what
>> it's doing?
>>
> 
> Do you mean the last few hundred lines from ha.log? Just the primary
> where I'm trying to start?

No, I mean output from the strace command.  From your reply, I'd guess
you've never used it:

  strace -tt -p process-id-of-hung-process > /some/file

Do that for a few seconds, and attach the file to an email to the list.

Does that help?



-- 
    Alan Robertson <[EMAIL PROTECTED]>

"Openness is the foundation and preservative of friendship...  Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Reply via email to