Where does the heartbeat 2.0.7 binaries come from? What packaging? What
distribution?

It's starting to sound (to me at least ;) like some broken heartbeat
package.

Yan

PS - it is not normal for the CIB to be rewritten every few seconds. It
should be rewritten when something in the cluster state changes
(resource or node status) - which should not happen for no reason.

Anders Brownworth wrote:
> So to add some color to this problem, I'm seeing these errors on box02:
> 
> ERROR: Exiting pengine process 4689 dumped core
> ERROR: crmdManagedChildDied:subsystems.c The pengine subsystem
> terminated unexpectedly
> ERROR: do_log:misc.c [[FSA]] Input I_ERROR from crmdManagedChildDied()
> received in state (S_TRANSITION_ENGINE)
> ERROR: do_recover:control.c Action A_RECOVER (0000000001000000) not
> supported
> ERROR: do_log:misc.c [[FSA]] Input I_STOP from do_recover() received
> in state (S_RECOVERY)
> 
> This is a 2 box setup migrating an IP and service (OpenSER) to run on
> one of
> the nodes at any given time. Box01 doesn't see any errors and just leaving
> the nodes alone to work it out doesn't work. I am unclear on the
> cib.xmlbecause I haven't seen it work any other way. Is it normal to
> have that file
> rewritten every 5 seconds or so? Seems like that's abnormal to me but can't
> get that confirmed in the docs.
> 
> I'm happy to grab any other information if it might help. Thanks.
> 
> -Anders
> 
> On 6/12/07, Anders Brownworth <[EMAIL PROTECTED]> wrote:
>>
>> Andrew,
>>
>> Sorry about the formatting. Attached are the /var/log/messages output of
>> box01 and box02 as well as the cib.xml. Also, this is heartbeat version
>> 2.0.7.
>>
>> As you can see in the messages, this isn't a problem that gets worked
>> out.
>> Both machines keep writing and re-writing their cib.xml files all day
>> long. I gather this isn't normal.
>>
>> Thanks for any light you can shed.
>>
>> -Anders
>>
>> On 6/12/07, Andrew Beekhof < [EMAIL PROTECTED]> wrote:
>> >
>> > On 6/12/07, Anders Brownworth < [EMAIL PROTECTED]> wrote:
>> > > Hi,
>> > >
>> > > I have a heartbeat v2 setup that I am trying to use to migrate
>> OpenSER
>> > and
>> > > an IP address back and forth between 2 boxes. (box01 and box02) I
>> > wrote an
>> > > OCF for OpenSER and am using the heartbeat provided IPaddr to manage
>> > the IP
>> > > address. My OCF checks out with the ocf-tester script and my
>> > > cib.xmlverifies with crm_verify -x /var/lib/heartbeat/crm/cib.xml.
>> > >
>> > > When I start both nodes with exactly the same configuration, they
>> > fight
>> > > about what the state of things and the first ERROR I get is:
>> > >
>> > > crmd: [3507]: ERROR: do_exit:control.c Could not recover from
>> internal
>> > error
>> > >
>> > > in the /var/log/messages of box02 and neither service is started. (IP
>> > nor
>> > > OpenSER) Both boxes give off a pile of info and warning messages that
>> > don't
>> > > seem to point me in a worthwhile direction.
>> > >
>> > > Both boxes seem to get to "info: main:attrd.c Starting mainloop..."
>> > without
>> > > any issues. But once they try to decide on who has what, they start
>> > fighting
>> > > and constantly rewriting their cib.xml files over and over unendedly.
>> > (Is
>> > > this normal?) crmd starts complaning about pengine dying on a signal
>> > 14 and
>> > > everyting pretty much goes to hell from there.
>> >
>> > it _should_ be able to recover.
>> >
>> > can you send your logs as attachments? (trying to read logs wrapped at
>> > 80 chars is hell)
>> >
>> > you also didnt mention what version you're running
>> > _______________________________________________
>> > Linux-HA mailing list
>> > [email protected]
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> > See also: http://linux-ha.org/ReportingProblems
>> >
>>
>>
>>
>> -- 
>> -Anders
>> -----------------------------------------------------------
>> Anders Brownworth
>> [EMAIL PROTECTED]
>>
>>
> 
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to