Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

Doug Knight Fri, 23 Mar 2007 08:56:20 -0800

I figured this one out, please ignore, its because I didn't give it a
value. If I run crm_master -v 100 at the command line, it spins right up
to 100% cpu with no error.


Doug

On Fri, 2007-03-23 at 12:52 -0400, Doug Knight wrote:
> This might help. With the resource in a failed mode, but target_role =
> started, I manually ran crm_master, exporting the proper resource ID,
> with the following results:
> 
> [EMAIL PROTECTED] wsi]# vi stateful_pgsql 
> [EMAIL PROTECTED] wsi]# OCF_RESOURCE_INSTANCE=pgsql_wal_5556:0
> [EMAIL PROTECTED] wsi]# export OCF_RESOURCE_INSTANCE
> [EMAIL PROTECTED] wsi]# crm_master -V
> crm_master[7588]: 2007/03/23_12:50:45 ERROR: crm_abort: main:
> Triggered non-fatal assert at crm_attribute.c:353 : attr_value != NULL
> 
> 
> Doug
> 
> On Fri, 2007-03-23 at 12:21 -0400, Doug Knight wrote:
> > Got it. The attached file contains the strace from the second
> > attempt by heartbeat to start the resource up as master, right up
> > until it was killed. The resource already showed failed on the gui.
> > I zipped it up using gzip.
> > 
> > Doug
> > 
> > On Fri, 2007-03-23 at 10:11 -0600, Alan Robertson wrote:  
> > > Doug Knight wrote:
> > > > On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote:
> > > >> Doug Knight wrote:
> > > >> > Current 2.0.8 tarball from 1/18/07. Process in top looks like:
> > > >> > 
> > > >> >   PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEM  TIME+   COMMAND
> > > >> > 24591 root  18   0 1663m 1.5g 1028 R   83 77.8  1:19.42
> > > >> > /usr/sbin/crm_master -v 100
> > > >> > 
> > > >> > It dies and restarts about every 120 seconds, which happens to be the
> > > >> > timeout I have specified for the stop and start methods.
> > > >> > 
> > > >> > Doug
> > > >> > 
> > > >> > On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote:
> > > >> >> Doug Knight wrote:
> > > >> >> > Hi Alan,
> > > >> >> > I've started testing my OCF script, and I'm seeing something 
> > > >> >> > unusual
> > > >> >> > during initial startup. I've placed a crm_master call in my
> > > >> >> > stateful_start function, after the function has determined that 
> > > >> >> > it is
> > > >> >> > running on what should be the master, and postgresql has 
> > > >> >> > successfully
> > > >> >> > started:
> > > >> >> > 
> > > >> >> > crm_master -v 100
> > > >> >> > 
> > > >> >> > When this command gets executed, it starts using nearly 100% CPU, 
> > > >> >> > memory
> > > >> >> > usage continuously increases up to about 68%, then it dies 
> > > >> >> > (killed via
> > > >> >> > timeout?), followed by a second attempt to go master (with the 
> > > >> >> > same
> > > >> >> > charactistics, after the function timeout is exceeded), then a 
> > > >> >> > demote is
> > > >> >> > sent (again, after timeout) and it switches to try to become the 
> > > >> >> > slave
> > > >> >> > (crm_master -v 10 is what I use, though I'm not sure this is 
> > > >> >> > correct
> > > >> >> > usage to say "I want to change to a slave). Eventually, I wind up 
> > > >> >> > with
> > > >> >> > the resource in failed mode.
> > > >> >> > 
> > > >> >> > First question, any idea why the straight line running of a 
> > > >> >> > crm_master
> > > >> >> > -v 100 (not within any loops in my script) would spin up to 100%?
> > > >> >>
> > > >> >> Bugs maybe?  What version of heartbeat are you running?  Which 
> > > >> >> processes
> > > >> >> are running up to 100%?  For how long?
> > > >> >>
> > > >> >> > Second question, is using the crm_master -v with different values 
> > > >> >> > the
> > > >> >> > way to say on which node I prefer the master to run (higher 
> > > >> >> > number =
> > > >> >> > preferred node)?
> > > >> >>
> > > >> >> Yes.  I believe that these are added into the values that come from
> > > >> >> other constraints in your configuration file to come up with a best
> > > >> >> configuration.
> > > >>
> > > >> Good info.
> > > >>
> > > >> Could you provide a few hundred lines of strace output to show us what
> > > >> it's doing?
> > > >>
> > > > 
> > > > Do you mean the last few hundred lines from ha.log? Just the primary
> > > > where I'm trying to start?
> > > 
> > > No, I mean output from the strace command.  From your reply, I'd guess
> > > you've never used it:
> > > 
> > >   strace -tt -p process-id-of-hung-process > /some/file
> > > 
> > > Do that for a few seconds, and attach the file to an email to the list.
> > > 
> > > Does that help?
> > > 
> > > 
> > > 
> > _______________________________________________________
> > Linux-HA-Dev: [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> > Home Page: http://linux-ha.org/
> _______________________________________________________
> Linux-HA-Dev: [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
> Home Page: http://linux-ha.org/

_______________________________________________________
Linux-HA-Dev: [email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev
Home Page: http://linux-ha.org/

Re: [Linux-ha-dev] Ordering of OCF Start, Stop and Monitor actions

Reply via email to