I figured this one out, please ignore, its because I didn't give it a value. If I run crm_master -v 100 at the command line, it spins right up to 100% cpu with no error.
Doug On Fri, 2007-03-23 at 12:52 -0400, Doug Knight wrote: > This might help. With the resource in a failed mode, but target_role = > started, I manually ran crm_master, exporting the proper resource ID, > with the following results: > > [EMAIL PROTECTED] wsi]# vi stateful_pgsql > [EMAIL PROTECTED] wsi]# OCF_RESOURCE_INSTANCE=pgsql_wal_5556:0 > [EMAIL PROTECTED] wsi]# export OCF_RESOURCE_INSTANCE > [EMAIL PROTECTED] wsi]# crm_master -V > crm_master[7588]: 2007/03/23_12:50:45 ERROR: crm_abort: main: > Triggered non-fatal assert at crm_attribute.c:353 : attr_value != NULL > > > Doug > > On Fri, 2007-03-23 at 12:21 -0400, Doug Knight wrote: > > Got it. The attached file contains the strace from the second > > attempt by heartbeat to start the resource up as master, right up > > until it was killed. The resource already showed failed on the gui. > > I zipped it up using gzip. > > > > Doug > > > > On Fri, 2007-03-23 at 10:11 -0600, Alan Robertson wrote: > > > Doug Knight wrote: > > > > On Fri, 2007-03-23 at 09:25 -0600, Alan Robertson wrote: > > > >> Doug Knight wrote: > > > >> > Current 2.0.8 tarball from 1/18/07. Process in top looks like: > > > >> > > > > >> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > >> > 24591 root 18 0 1663m 1.5g 1028 R 83 77.8 1:19.42 > > > >> > /usr/sbin/crm_master -v 100 > > > >> > > > > >> > It dies and restarts about every 120 seconds, which happens to be the > > > >> > timeout I have specified for the stop and start methods. > > > >> > > > > >> > Doug > > > >> > > > > >> > On Fri, 2007-03-23 at 08:20 -0600, Alan Robertson wrote: > > > >> >> Doug Knight wrote: > > > >> >> > Hi Alan, > > > >> >> > I've started testing my OCF script, and I'm seeing something > > > >> >> > unusual > > > >> >> > during initial startup. I've placed a crm_master call in my > > > >> >> > stateful_start function, after the function has determined that > > > >> >> > it is > > > >> >> > running on what should be the master, and postgresql has > > > >> >> > successfully > > > >> >> > started: > > > >> >> > > > > >> >> > crm_master -v 100 > > > >> >> > > > > >> >> > When this command gets executed, it starts using nearly 100% CPU, > > > >> >> > memory > > > >> >> > usage continuously increases up to about 68%, then it dies > > > >> >> > (killed via > > > >> >> > timeout?), followed by a second attempt to go master (with the > > > >> >> > same > > > >> >> > charactistics, after the function timeout is exceeded), then a > > > >> >> > demote is > > > >> >> > sent (again, after timeout) and it switches to try to become the > > > >> >> > slave > > > >> >> > (crm_master -v 10 is what I use, though I'm not sure this is > > > >> >> > correct > > > >> >> > usage to say "I want to change to a slave). Eventually, I wind up > > > >> >> > with > > > >> >> > the resource in failed mode. > > > >> >> > > > > >> >> > First question, any idea why the straight line running of a > > > >> >> > crm_master > > > >> >> > -v 100 (not within any loops in my script) would spin up to 100%? > > > >> >> > > > >> >> Bugs maybe? What version of heartbeat are you running? Which > > > >> >> processes > > > >> >> are running up to 100%? For how long? > > > >> >> > > > >> >> > Second question, is using the crm_master -v with different values > > > >> >> > the > > > >> >> > way to say on which node I prefer the master to run (higher > > > >> >> > number = > > > >> >> > preferred node)? > > > >> >> > > > >> >> Yes. I believe that these are added into the values that come from > > > >> >> other constraints in your configuration file to come up with a best > > > >> >> configuration. > > > >> > > > >> Good info. > > > >> > > > >> Could you provide a few hundred lines of strace output to show us what > > > >> it's doing? > > > >> > > > > > > > > Do you mean the last few hundred lines from ha.log? Just the primary > > > > where I'm trying to start? > > > > > > No, I mean output from the strace command. From your reply, I'd guess > > > you've never used it: > > > > > > strace -tt -p process-id-of-hung-process > /some/file > > > > > > Do that for a few seconds, and attach the file to an email to the list. > > > > > > Does that help? > > > > > > > > > > > _______________________________________________________ > > Linux-HA-Dev: [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > > Home Page: http://linux-ha.org/ > _______________________________________________________ > Linux-HA-Dev: [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > Home Page: http://linux-ha.org/
_______________________________________________________ Linux-HA-Dev: [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/
