Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

Doug Knight Fri, 23 Mar 2007 08:21:56 -0800

Got it. The attached file contains the strace from the second attempt by
heartbeat to start the resource up as master, right up until it was
killed. The resource already showed failed on the gui. I zipped it up
using gzip.


Doug

On Fri, 2007-03-23 at 10:11 -0600, Alan Robertson wrote:
> Doug Knight wrote:
> > On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote:
> >> Doug Knight wrote:
> >> > Current 2.0.8 tarball from 1/18/07. Process in top looks like:
> >> > 
> >> >   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM  TIME+   COMMAND
> >> > 24591 root  18   0 1663m 1.5g 1028 R   83 77.8  1:19.42
> >> > /usr/sbin/crm_master -v 100
> >> > 
> >> > It dies and restarts about every 120 seconds, which happens to be the
> >> > timeout I have specified for the stop and start methods.
> >> > 
> >> > Doug
> >> > 
> >> > On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote:
> >> >> Doug Knight wrote:
> >> >> > Hi Alan,
> >> >> > I've started testing my OCF script, and I'm seeing something unusual
> >> >> > during initial startup. I've placed a crm_master call in my
> >> >> > stateful_start function, after the function has determined that it is
> >> >> > running on what should be the master, and postgresql has successfully
> >> >> > started:
> >> >> > 
> >> >> > crm_master -v 100
> >> >> > 
> >> >> > When this command gets executed, it starts using nearly 100% CPU, 
> >> >> > memory
> >> >> > usage continuously increases up to about 68%, then it dies (killed via
> >> >> > timeout?), followed by a second attempt to go master (with the same
> >> >> > charactistics, after the function timeout is exceeded), then a demote 
> >> >> > is
> >> >> > sent (again, after timeout) and it switches to try to become the slave
> >> >> > (crm_master -v 10 is what I use, though I'm not sure this is correct
> >> >> > usage to say "I want to change to a slave). Eventually, I wind up with
> >> >> > the resource in failed mode.
> >> >> > 
> >> >> > First question, any idea why the straight line running of a crm_master
> >> >> > -v 100 (not within any loops in my script) would spin up to 100%?
> >> >>
> >> >> Bugs maybe?  What version of heartbeat are you running?  Which processes
> >> >> are running up to 100%?  For how long?
> >> >>
> >> >> > Second question, is using the crm_master -v with different values the
> >> >> > way to say on which node I prefer the master to run (higher number =
> >> >> > preferred node)?
> >> >>
> >> >> Yes.  I believe that these are added into the values that come from
> >> >> other constraints in your configuration file to come up with a best
> >> >> configuration.
> >>
> >> Good info.
> >>
> >> Could you provide a few hundred lines of strace output to show us what
> >> it's doing?
> >>
> > 
> > Do you mean the last few hundred lines from ha.log? Just the primary
> > where I'm trying to start?
> 
> No, I mean output from the strace command.  From your reply, I'd guess
> you've never used it:
> 
>   strace -tt -p process-id-of-hung-process > /some/file
> 
> Do that for a few seconds, and attach the file to an email to the list.
> 
> Does that help?
> 
> 
>

crm_master.strace.gz
Description: GNU Zip compressed data

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

Reply via email to