Hi, #4 0x0805add0 in build_operation_update (xml_rsc=0x8109430, op=0x8282b98, src=0x80692d9 "do_update_resource", lpc=0) at lrm.c:347
(gdb) print *0x8282b98 $1 = 136448640 If you want I can send you the core off list. I keep all the cores :) Cheers, Dejan On Thu, Apr 20, 2006 at 09:15:28AM +0200, Andrew Beekhof wrote: > On 4/20/06, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > > Hello, > > > > Running CTS with HEAD hanged the cluster after crmd dumped core > > (abort). It happened after 53 tests with this curious message: > > > > Apr 19 17:48:01 BadNews: Apr 19 17:42:48 sapcl01 crmd: [17937]: ERROR: > > mask(lrm.c:build_operation_update): Triggered non-fatal assert at > > lrm.c:349: fsa_our_dc_version != NULL > > We have two kinds of asserts... neither are supposed to happen and > both create a core file so that we can diagnose how we got there. > However non-fatal ones call fork first (so the main process doesn't > die) and then take some recovery action. > > Sometimes the non-fatal varieties are used in new pieces of code to > make sure they work as we expect and that is what has happened here. > > Do you still have the core file? > I'd be interested to know the result of: > print *op > from frame #4 > > In the meantime, I'll look at the logs and see what I can figure out. > > > Apr 19 17:48:01 BadNews: Apr 19 17:42:48 sapcl01 crmd: [17937]: ERROR: > > Exiting untracked process process 19654 dumped core > > Apr 19 17:48:01 BadNews: Apr 19 17:45:49 sapcl01 crmd: [17937]: ERROR: > > mask(utils.c:crm_timer_popped): Finalization Timer (I_ELECTION) just popped! > > > > The cluster looks like this, unchanged for several hours: > > > > ============ > > Last updated: Thu Apr 20 04:43:47 2006 > > Current DC: sapcl01 (85180fd0-70c9-4136-a5e0-90d89ea6079d) > > 3 Nodes configured. > > 3 Resources configured. > > ============ > > > > Node: sapcl03 (0bfb78a2-fcd2-4f52-8a06-2d17437a6750): online > > Node: sapcl02 (09fa194c-d7e1-41fa-a0d0-afd79a139181): online > > Node: sapcl01 (85180fd0-70c9-4136-a5e0-90d89ea6079d): online > > > > Resource Group: group_1 > > IPaddr_1 (heartbeat::ocf:IPaddr): Started sapcl03 > > LVM_2 (heartbeat::ocf:LVM): Stopped > > Filesystem_3 (heartbeat::ocf:Filesystem): Stopped > > Resource Group: group_2 > > IPaddr_2 (heartbeat::ocf:IPaddr): Started sapcl02 > > LVM_3 (heartbeat::ocf:LVM): Started sapcl02 > > Filesystem_4 (heartbeat::ocf:Filesystem): Started sapcl02 > > Resource Group: group_3 > > IPaddr_3 (heartbeat::ocf:IPaddr): Started sapcl03 > > LVM_4 (heartbeat::ocf:LVM): Started sapcl03 > > Filesystem_5 (heartbeat::ocf:Filesystem): Started sapcl03 > > > > And: > > > > sapcl01# crmadmin -S sapcl01 > > Status of [EMAIL PROTECTED]: S_TERMINATE (ok) > > > > All processes are still running on this node, but heartbeat seems > > to be in some kind of limbo. > > > > Cheers, > > > > Dejan > > > > > > _______________________________________________________ > > Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org > > http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev > > Home Page: http://linux-ha.org/ > > > > > > > > _______________________________________________________ Linux-HA-Dev: Linux-HA-Dev@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha-dev Home Page: http://linux-ha.org/