Re: [Linux-HA] We are still in a transition. Delaying until the TE completes.

Andrew Beekhof Fri, 28 Aug 2009 05:24:18 -0700

On Fri, Aug 28, 2009 at 2:44 AM, fengyandong<[email protected]> wrote:
> Thanks for your help! Could you tell me what's wrong?


No idea sorry.
The reason we put out new versions is that they have fixes for things like this.

> If I still want to
> user the version 2.1.4, how can I solve the problem?

Scan through a few thousand patches to find the right one?

> 2009/8/27 Andrew Beekhof <[email protected]>
>
>> On Thu, Aug 27, 2009 at 12:42 PM, fengyandong<[email protected]> wrote:
>> > The attachment is /var/log/messages from passive node. The heartbeat
>> version
>> > is 2.1.4
>>
>> Far too old. Sorry.
>> Almost certainly any problem you have was fixed a long time ago.
>> Head over to http://www.clusterlabs.org/wiki/Install where you'll find
>> instructions on installing something recent.
>>
>> >
>> > 2009/8/27 Andrew Beekhof <[email protected]>
>> >
>> >> Please send logs as attachments (they're impossible to read otherwise)
>> >> and provide version details.
>> >>
>> >> On Thu, Aug 27, 2009 at 10:56 AM, fengyandong<[email protected]>
>> wrote:
>> >> > I configured a Heartbeat cluster with two nodes in an activce/passive
>> >> > configuration.
>> >> > When the active node is poweroff immediately, the passive node does
>> not
>> >> > takeover the resource as expected.
>> >> >
>> >> > The /var/log/ha-log from passive node is here:
>> >> >
>> >> > Aug 26 13:52:33 paul kernel: ib0: multicast join failed for
>> >> > ff12:401b:ffff:0000:0000:0000:ffff:ffff, status -11
>> >> > Aug 26 13:52:35 paul heartbeat: [12113]: WARN: node peter: is dead
>> >> > Aug 26 13:52:35 paul heartbeat: [12113]: info: Send event about the
>> given
>> >> > node dead: [bwevent_send --level error --module heartbeat --code
>> 30100026
>> >> > --desc "paul can not connect to peter."]
>> >> > Aug 26 13:52:35 paul crmd: [12132]: notice: crmd_ha_status_callback:
>> >> Status
>> >> > update: Node peter now has status [dead]
>> >> > Aug 26 13:52:35 paul heartbeat: [12113]: info: Link peter:ib0 dead.
>> >> > Aug 26 13:52:35 paul heartbeat: [12113]: info: Send event about the
>> given
>> >> > link status: [bwevent_send --level error --code 30100025 --module
>> >> heartbeat
>> >> > --desc "Heartbeat on network interface (ib0) fault."]
>> >> > Aug 26 13:52:35 paul crmd: [12132]: WARN: do_dc_join_finalize: join-2:
>> We
>> >> > are still in a transition.  Delaying until the TE completes.
>> >> > Aug 26 13:52:35 paul tengine: [29501]: WARN: match_down_event: No
>> match
>> >> for
>> >> > shutdown action on 4914271e-4f59-4d45-a5ef-437b0c884629
>> >> > Aug 26 13:52:35 paul tengine: [29501]: info: extract_event:
>> >> Stonith/shutdown
>> >> > of 4914271e-4f59-4d45-a5ef-437b0c884629 not matched
>> >> > Aug 26 13:52:35 paul crmd: [12132]: WARN: do_dc_join_finalize: join-2:
>> We
>> >> > are still in a transition.  Delaying until the TE completes.
>> >> > Aug 26 13:52:35 paul cib: [12128]: info: mem_handle_event: Got an
>> event
>> >> > OC_EV_MS_NOT_PRIMARY from ccm
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: mem_handle_event: Got an
>> event
>> >> > OC_EV_MS_NOT_PRIMARY from ccm
>> >> > Aug 26 13:52:35 paul cib: [12128]: info: mem_handle_event:
>> instance=11,
>> >> > nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: mem_handle_event:
>> instance=11,
>> >> > nodes=2, new=1, lost=0, n_idx=0, new_idx=2, old_idx=4
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: crmd_ccm_msg_callback:
>> Quorum
>> >> lost
>> >> > after event=NOT PRIMARY (id=11)
>> >> > Aug 26 13:52:35 paul crmd: [12132]: WARN: do_dc_join_finalize: join-2:
>> We
>> >> > are still in a transition.  Delaying until the TE completes.
>> >> > Aug 26 13:52:35 paul ccm: [12127]: info: ccm_state_sent_memlistreq:
>> >> directly
>> >> > callccm_compute_and_send_final_memlist()
>> >> > Aug 26 13:52:35 paul ccm: [12127]: info: Break tie for 2 nodes cluster
>> >> > Aug 26 13:52:35 paul cib: [12128]: info: mem_handle_event: Got an
>> event
>> >> > OC_EV_MS_INVALID from ccm
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: mem_handle_event: Got an
>> event
>> >> > OC_EV_MS_INVALID from ccm
>> >> > Aug 26 13:52:35 paul cib: [12128]: info: mem_handle_event: no
>> mbr_track
>> >> info
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: mem_handle_event: no
>> mbr_track
>> >> > info
>> >> > Aug 26 13:52:35 paul ccm: [12127]: info: Break tie for 2 nodes cluster
>> >> > Aug 26 13:52:35 paul cib: [12128]: info: mem_handle_event: Got an
>> event
>> >> > OC_EV_MS_NEW_MEMBERSHIP from ccm
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: mem_handle_event: Got an
>> event
>> >> > OC_EV_MS_NEW_MEMBERSHIP from ccm
>> >> > Aug 26 13:52:35 paul cib: [12128]: info: mem_handle_event:
>> instance=12,
>> >> > nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: mem_handle_event:
>> instance=12,
>> >> > nodes=1, new=0, lost=1, n_idx=0, new_idx=1, old_idx=3
>> >> > Aug 26 13:52:35 paul cib: [12128]: info: cib_ccm_msg_callback: LOST:
>> >> peter
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: crmd_ccm_msg_callback:
>> Quorum
>> >> > (re)attained after event=NEW MEMBERSHIP (id=12)
>> >> > Aug 26 13:52:35 paul cib: [12128]: info: cib_ccm_msg_callback: PEER:
>> paul
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: erase_node_from_join:
>> Removed
>> >> dead
>> >> > node peter from join calculations: welcomed=0 itegrated=0 finalized=0
>> >> > confirmed=0
>> >> > Aug 26 13:52:35 paul crmd: [12132]: WARN: check_dead_member: Our DC
>> node
>> >> > (peter) left the cluster
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: ccm_event_detail: NEW
>> >> MEMBERSHIP:
>> >> > trans=12, nodes=1, new=0, lost=1 n_idx=0, new_idx=1, old_idx=3
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: ccm_event_detail:  CURRENT:
>> >> paul
>> >> > [nodeid=0, born=12]
>> >> > Aug 26 13:52:35 paul crmd: [12132]: info: ccm_event_detail:  LOST:
>> >>  peter
>> >> > [nodeid=1, born=1]
>> >> > Aug 26 13:52:35 paul crmd: [12132]: WARN: do_dc_join_finalize: join-2:
>> We
>> >> > are still in a transition.  Delaying until the TE completes.
>> >> > Aug 26 13:52:35 paul crmd: [12132]: WARN: register_fsa_input_adv:
>> >> > do_dc_join_finalize stalled the FSA with pending inputs
>> >> > Aug 26 13:52:35 paul crmd: [12132]: WARN: do_dc_join_finalize: join-2:
>> We
>> >> > are still in a transition.  Delaying until the TE completes.
>> >> > Aug 26 13:52:36 paul crmd: [12132]: WARN: do_dc_join_finalize: join-2:
>> We
>> >> > are still in a transition.  Delaying until the TE completes.
>> >> > Aug 26 13:52:42 paul OpenSM[7074]: SM port is down
>> >> > Aug 26 13:52:42 paul OpenSM[7074]: Entering DISCOVERING state
>> >> > Aug 26 13:52:44 paul : error initializing: File contains no section
>> >> headers.
>> >> > file:
>> >>
>> file://///etc/yum.repos.d/CentoOS-Base.repo<file://etc/yum.repos.d/CentoOS-Base.repo>,
>> >> > line: 1 'gpgcheck=1\n'
>> >> > Aug 26 13:52:52 paul OpenSM[7074]: SM port is down
>> >> > Aug 26 13:52:52 paul kernel: enfs: server 192.168.1.3 not responding,
>> >> still
>> >> > trying
>> >> > Aug 26 13:53:02 paul OpenSM[7074]: SM port is down
>> >> > Aug 26 13:53:42 paul last message repeated 4 times
>> >> > Aug 26 13:53:49 paul kernel: nfs: server 192.168.1.3 not responding,
>> >> still
>> >> > trying
>> >> > Aug 26 13:53:52 paul OpenSM[7074]: SM port is down
>> >> > Aug 26 13:54:02 paul OpenSM[7074]: SM port is down
>> >> > Aug 26 13:54:08 paul cib: [12128]: info: cib_stats: Processed 11
>> >> operations
>> >> > (909.00us average, 0% utilization) in the last 10min
>> >> > Aug 26 13:54:12 paul OpenSM[7074]: SM port is down
>> >> > Aug 26 13:54:32 paul last message repeated 2 times
>> >> > Aug 26 13:54:39 paul crm_mon: [11765]: info: G_main_add_SignalHandler:
>> >> Added
>> >> > signal handler for signal 15
>> >> > Aug 26 13:54:39 paul crm_mon: [11765]: info: G_main_add_SignalHandler:
>> >> Added
>> >> > signal handler for signal 2
>> >> >
>> >> >
>> >> > Any help,
>> >> > Thanks!
>> >> > _______________________________________________
>> >> > Linux-HA mailing list
>> >> > [email protected]
>> >> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >> > See also: http://linux-ha.org/ReportingProblems
>> >> >
>> >> _______________________________________________
>> >> Linux-HA mailing list
>> >> [email protected]
>> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> >> See also: http://linux-ha.org/ReportingProblems
>> >>
>> >
>> > _______________________________________________
>> > Linux-HA mailing list
>> > [email protected]
>> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> > See also: http://linux-ha.org/ReportingProblems
>> >
>> _______________________________________________
>> Linux-HA mailing list
>> [email protected]
>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
>> See also: http://linux-ha.org/ReportingProblems
>>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] We are still in a transition. Delaying until the TE completes.

Reply via email to