Re: [Linux-HA] crmd: [4681]: ERROR: Exiting pengine process 4689 dumped core

Anders Brownworth Tue, 12 Jun 2007 15:13:23 -0700

So to add some color to this problem, I'm seeing these errors on box02:


ERROR: Exiting pengine process 4689 dumped core
ERROR: crmdManagedChildDied:subsystems.c The pengine subsystem
terminated unexpectedly
ERROR: do_log:misc.c [[FSA]] Input I_ERROR from crmdManagedChildDied()
received in state (S_TRANSITION_ENGINE)
ERROR: do_recover:control.c Action A_RECOVER (0000000001000000) not supported
ERROR: do_log:misc.c [[FSA]] Input I_STOP from do_recover() received
in state (S_RECOVERY)

This is a 2 box setup migrating an IP and service (OpenSER) to run on one of
the nodes at any given time. Box01 doesn't see any errors and just leaving
the nodes alone to work it out doesn't work. I am unclear on the
cib.xmlbecause I haven't seen it work any other way. Is it normal to
have that file
rewritten every 5 seconds or so? Seems like that's abnormal to me but can't
get that confirmed in the docs.

I'm happy to grab any other information if it might help. Thanks.

-Anders

On 6/12/07, Anders Brownworth <[EMAIL PROTECTED]> wrote:


Andrew,

Sorry about the formatting. Attached are the /var/log/messages output of
box01 and box02 as well as the cib.xml. Also, this is heartbeat version
2.0.7.

As you can see in the messages, this isn't a problem that gets worked out.
Both machines keep writing and re-writing their cib.xml files all day
long. I gather this isn't normal.

Thanks for any light you can shed.

-Anders

On 6/12/07, Andrew Beekhof < [EMAIL PROTECTED]> wrote:
>
> On 6/12/07, Anders Brownworth < [EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > I have a heartbeat v2 setup that I am trying to use to migrate OpenSER
> and
> > an IP address back and forth between 2 boxes. (box01 and box02) I
> wrote an
> > OCF for OpenSER and am using the heartbeat provided IPaddr to manage
> the IP
> > address. My OCF checks out with the ocf-tester script and my
> > cib.xmlverifies with crm_verify -x /var/lib/heartbeat/crm/cib.xml.
> >
> > When I start both nodes with exactly the same configuration, they
> fight
> > about what the state of things and the first ERROR I get is:
> >
> > crmd: [3507]: ERROR: do_exit:control.c Could not recover from internal
> error
> >
> > in the /var/log/messages of box02 and neither service is started. (IP
> nor
> > OpenSER) Both boxes give off a pile of info and warning messages that
> don't
> > seem to point me in a worthwhile direction.
> >
> > Both boxes seem to get to "info: main:attrd.c Starting mainloop..."
> without
> > any issues. But once they try to decide on who has what, they start
> fighting
> > and constantly rewriting their cib.xml files over and over unendedly.
> (Is
> > this normal?) crmd starts complaning about pengine dying on a signal
> 14 and
> > everyting pretty much goes to hell from there.
>
> it _should_ be able to recover.
>
> can you send your logs as attachments? (trying to read logs wrapped at
> 80 chars is hell)
>
> you also didnt mention what version you're running
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



--
-Anders
-----------------------------------------------------------
Anders Brownworth
[EMAIL PROTECTED]



--
-Anders
-----------------------------------------------------------
Anders Brownworth
[EMAIL PROTECTED]
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmd: [4681]: ERROR: Exiting pengine process 4689 dumped core

Reply via email to