Indeed, that was it. I recompiled by hand from source and everything seems to be working. Must have been some bad code somewhere.
Thanks for the help. -Anders On 6/13/07, Anders Brownworth <[EMAIL PROTECTED]> wrote:
It was a source emerge from a gentoo build v 2006.1. I'll compile by hand from the source and reply with my experience. Thanks for the response. -Anders On 6/13/07, Yan Fitterer <[EMAIL PROTECTED]> wrote: > > Where does the heartbeat 2.0.7 binaries come from? What packaging? What > distribution? > > It's starting to sound (to me at least ;) like some broken heartbeat > package. > > Yan > > PS - it is not normal for the CIB to be rewritten every few seconds. It > should be rewritten when something in the cluster state changes > (resource or node status) - which should not happen for no reason. > > Anders Brownworth wrote: > > So to add some color to this problem, I'm seeing these errors on > box02: > > > > ERROR: Exiting pengine process 4689 dumped core > > ERROR: crmdManagedChildDied:subsystems.c The pengine subsystem > > terminated unexpectedly > > ERROR: do_log:misc.c [[FSA]] Input I_ERROR from crmdManagedChildDied() > > > received in state (S_TRANSITION_ENGINE) > > ERROR: do_recover:control.c Action A_RECOVER (0000000001000000) not > > supported > > ERROR: do_log:misc.c [[FSA]] Input I_STOP from do_recover() received > > in state (S_RECOVERY) > > > > This is a 2 box setup migrating an IP and service (OpenSER) to run on > > one of > > the nodes at any given time. Box01 doesn't see any errors and just > leaving > > the nodes alone to work it out doesn't work. I am unclear on the > > cib.xmlbecause I haven't seen it work any other way. Is it normal to > > have that file > > rewritten every 5 seconds or so? Seems like that's abnormal to me but > can't > > get that confirmed in the docs. > > > > I'm happy to grab any other information if it might help. Thanks. > > > > -Anders > > > > On 6/12/07, Anders Brownworth <[EMAIL PROTECTED]> wrote: > >> > >> Andrew, > >> > >> Sorry about the formatting. Attached are the /var/log/messages output > of > >> box01 and box02 as well as the cib.xml. Also, this is heartbeat > version > >> 2.0.7. > >> > >> As you can see in the messages, this isn't a problem that gets worked > >> out. > >> Both machines keep writing and re-writing their cib.xml files all day > >> long. I gather this isn't normal. > >> > >> Thanks for any light you can shed. > >> > >> -Anders > >> > >> On 6/12/07, Andrew Beekhof < [EMAIL PROTECTED]> wrote: > >> > > >> > On 6/12/07, Anders Brownworth < [EMAIL PROTECTED]> wrote: > >> > > Hi, > >> > > > >> > > I have a heartbeat v2 setup that I am trying to use to migrate > >> OpenSER > >> > and > >> > > an IP address back and forth between 2 boxes. (box01 and box02) I > >> > wrote an > >> > > OCF for OpenSER and am using the heartbeat provided IPaddr to > manage > >> > the IP > >> > > address. My OCF checks out with the ocf-tester script and my > >> > > cib.xmlverifies with crm_verify -x > /var/lib/heartbeat/crm/cib.xml. > >> > > > >> > > When I start both nodes with exactly the same configuration, they > >> > fight > >> > > about what the state of things and the first ERROR I get is: > >> > > > >> > > crmd: [3507]: ERROR: do_exit: control.c Could not recover from > >> internal > >> > error > >> > > > >> > > in the /var/log/messages of box02 and neither service is started. > (IP > >> > nor > >> > > OpenSER) Both boxes give off a pile of info and warning messages > that > >> > don't > >> > > seem to point me in a worthwhile direction. > >> > > > >> > > Both boxes seem to get to "info: main:attrd.c Starting > mainloop..." > >> > without > >> > > any issues. But once they try to decide on who has what, they > start > >> > fighting > >> > > and constantly rewriting their cib.xml files over and over > unendedly. > >> > (Is > >> > > this normal?) crmd starts complaning about pengine dying on a > signal > >> > 14 and > >> > > everyting pretty much goes to hell from there. > >> > > >> > it _should_ be able to recover. > >> > > >> > can you send your logs as attachments? (trying to read logs wrapped > at > >> > 80 chars is hell) > >> > > >> > you also didnt mention what version you're running > >> > _______________________________________________ > >> > Linux-HA mailing list > >> > [email protected] > >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> > See also: http://linux-ha.org/ReportingProblems > >> > > >> > >> > >> > >> -- > >> -Anders > >> ----------------------------------------------------------- > >> Anders Brownworth > >> [EMAIL PROTECTED] > >> > >> > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- -Anders ----------------------------------------------------------- Anders Brownworth [EMAIL PROTECTED]
-- -Anders ----------------------------------------------------------- Anders Brownworth [EMAIL PROTECTED] _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
