On Thu, Apr 17, 2008 at 01:03:09PM +0200, Andrew Beekhof wrote: > On Thu, Apr 17, 2008 at 12:58 PM, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > > > On Thu, Apr 17, 2008 at 12:56 PM, Andrew Beekhof <[EMAIL PROTECTED]> wrote: > > > On Thu, Apr 17, 2008 at 12:35 PM, Luis Motta Campos > > > <[EMAIL PROTECTED]> wrote: > > > > Dejan Muhamedagic wrote: > > > > > Hi, > > > > > > > > >> respawn hacluster /usr/lib64/heartbeat/ipfail > > > > > > > > > > ipfail doesn't work with crm. You should use pingd instead. > > > > > > > > Well, I don't think this helps. :( I'm using the suggested > > (reasonable > > > > for me) defaults: > > > > > > > > respawn root /usr/lib64/heartbeat/pingd -m 100 -d 5s > > > > > > > > (yes, I'm running CentOS x86_64). > > > > > > > > I still have problems, but they seem to be worse, now. Before, if I > > > > restarted heartbeat (/etc/init.d/heartbeat restart), any service > > running > > > > on the machine jumped away before the restart, and heartbeat was > > able to > > > > restart ok. > > > > > > > > Using pingd instead of the ipfail, even this is crippled, and > > heartbeat > > > > reboots the peer host (the one supposed to keep services running) if > > I > > > > try to restart the heartbeat service on one of the machines. > > > > > > > > I presume I'm doing something really stupid, but I can't understand > > it. > > > > Please help me out. I used hb_report to fetch all I know about my > > > > system, please find the report attached. > > > > > > > > > > random question - did you install from source or packages? where did > > > you get them from? > > > > > > > and a followup... you cant just make up values for target_role: > > > > <nvpair name="target_role" value="Started:Master" > > id="d54bdbb8-5d79-4d12-a95f-9b9b015176e3"/> > > > > makes no sense. just "Master" would be correct > > > > Then there is the failed start operation... that wont be helping at all. > > pengine[13743]: 2008/04/17_12:23:22 WARN: unpack_rsc_op: Processing > failed op database-filesystem_start_0 on db-sql1.ripe.net: Error > > And finally, it looks like there was a crash in the pengine process. > > crmd[12352]: 2008/04/17_12:23:22 WARN: Managed pengine process 13743 > killed by signal 11 [SIGSEGV - Segmentation violation]. > crmd[12352]: 2008/04/17_12:23:22 ERROR: Managed pengine process 13743 > dumped core > > can you have a look for a core file in > /var/lib/heartbeat/cores/hacluster/ and post the backtrace?
Hmm, again hb_report didn't produce a backtrace, this time not there's not even a header/footer by echo(1). That's really strange. Luis: Did you see any errors while running hb_report? Thanks, Dejan > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
