Re: [Linux-HA] crmd: [4681]: ERROR: Exiting pengine process 4689 dumped core

Anders Brownworth Wed, 13 Jun 2007 07:58:32 -0700

Indeed, that was it. I recompiled by hand from source and everything seems
to be working. Must have been some bad code somewhere.


Thanks for the help.

-Anders

On 6/13/07, Anders Brownworth <[EMAIL PROTECTED]> wrote:


It was a source emerge from a gentoo build v 2006.1. I'll compile by hand
from the source and reply with my experience. Thanks for the response.

-Anders

On 6/13/07, Yan Fitterer <[EMAIL PROTECTED]> wrote:
>
> Where does the heartbeat 2.0.7 binaries come from? What packaging? What
> distribution?
>
> It's starting to sound (to me at least ;) like some broken heartbeat
> package.
>
> Yan
>
> PS - it is not normal for the CIB to be rewritten every few seconds. It
> should be rewritten when something in the cluster state changes
> (resource or node status) - which should not happen for no reason.
>
> Anders Brownworth wrote:
> > So to add some color to this problem, I'm seeing these errors on
> box02:
> >
> > ERROR: Exiting pengine process 4689 dumped core
> > ERROR: crmdManagedChildDied:subsystems.c The pengine subsystem
> > terminated unexpectedly
> > ERROR: do_log:misc.c [[FSA]] Input I_ERROR from crmdManagedChildDied()
>
> > received in state (S_TRANSITION_ENGINE)
> > ERROR: do_recover:control.c Action A_RECOVER (0000000001000000) not
> > supported
> > ERROR: do_log:misc.c [[FSA]] Input I_STOP from do_recover() received
> > in state (S_RECOVERY)
> >
> > This is a 2 box setup migrating an IP and service (OpenSER) to run on
> > one of
> > the nodes at any given time. Box01 doesn't see any errors and just
> leaving
> > the nodes alone to work it out doesn't work. I am unclear on the
> > cib.xmlbecause I haven't seen it work any other way. Is it normal to
> > have that file
> > rewritten every 5 seconds or so? Seems like that's abnormal to me but
> can't
> > get that confirmed in the docs.
> >
> > I'm happy to grab any other information if it might help. Thanks.
> >
> > -Anders
> >
> > On 6/12/07, Anders Brownworth <[EMAIL PROTECTED]> wrote:
> >>
> >> Andrew,
> >>
> >> Sorry about the formatting. Attached are the /var/log/messages output
> of
> >> box01 and box02 as well as the cib.xml. Also, this is heartbeat
> version
> >> 2.0.7.
> >>
> >> As you can see in the messages, this isn't a problem that gets worked
> >> out.
> >> Both machines keep writing and re-writing their cib.xml files all day
> >> long. I gather this isn't normal.
> >>
> >> Thanks for any light you can shed.
> >>
> >> -Anders
> >>
> >> On 6/12/07, Andrew Beekhof < [EMAIL PROTECTED]> wrote:
> >> >
> >> > On 6/12/07, Anders Brownworth < [EMAIL PROTECTED]> wrote:
> >> > > Hi,
> >> > >
> >> > > I have a heartbeat v2 setup that I am trying to use to migrate
> >> OpenSER
> >> > and
> >> > > an IP address back and forth between 2 boxes. (box01 and box02) I
> >> > wrote an
> >> > > OCF for OpenSER and am using the heartbeat provided IPaddr to
> manage
> >> > the IP
> >> > > address. My OCF checks out with the ocf-tester script and my
> >> > > cib.xmlverifies with crm_verify -x
> /var/lib/heartbeat/crm/cib.xml.
> >> > >
> >> > > When I start both nodes with exactly the same configuration, they
> >> > fight
> >> > > about what the state of things and the first ERROR I get is:
> >> > >
> >> > > crmd: [3507]: ERROR: do_exit: control.c Could not recover from
> >> internal
> >> > error
> >> > >
> >> > > in the /var/log/messages of box02 and neither service is started.
> (IP
> >> > nor
> >> > > OpenSER) Both boxes give off a pile of info and warning messages
> that
> >> > don't
> >> > > seem to point me in a worthwhile direction.
> >> > >
> >> > > Both boxes seem to get to "info: main:attrd.c Starting
> mainloop..."
> >> > without
> >> > > any issues. But once they try to decide on who has what, they
> start
> >> > fighting
> >> > > and constantly rewriting their cib.xml files over and over
> unendedly.
> >> > (Is
> >> > > this normal?) crmd starts complaning about pengine dying on a
> signal
> >> > 14 and
> >> > > everyting pretty much goes to hell from there.
> >> >
> >> > it _should_ be able to recover.
> >> >
> >> > can you send your logs as attachments? (trying to read logs wrapped
> at
> >> > 80 chars is hell)
> >> >
> >> > you also didnt mention what version you're running
> >> > _______________________________________________
> >> > Linux-HA mailing list
> >> > [email protected]
> >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> > See also: http://linux-ha.org/ReportingProblems
> >> >
> >>
> >>
> >>
> >> --
> >> -Anders
> >> -----------------------------------------------------------
> >> Anders Brownworth
> >> [EMAIL PROTECTED]
> >>
> >>
> >
> >
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



--
-Anders
-----------------------------------------------------------
Anders Brownworth
[EMAIL PROTECTED]




--
-Anders
-----------------------------------------------------------
Anders Brownworth
[EMAIL PROTECTED]
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] crmd: [4681]: ERROR: Exiting pengine process 4689 dumped core

Reply via email to