Re: [Linux-HA] Q: "lost vote" while network seems up

Andrew Beekhof Tue, 11 Oct 2011 19:42:56 -0700

On Thu, Sep 29, 2011 at 6:09 PM, Ulrich Windl
<[email protected]> wrote:
> Hello!
>
> I'm examining a case where both nodes of a two node cluster were fenced at 
> the same time. The cluster is running SLES11 SP1 with a corosync 1.4.1 Update 
> to make the rrp stable. I found strange messages:
>
> 08:15:25 h02 cib: [10993]: WARN: cib_process_replace: Replacement 0.952.21 
> not applied to 0.952.23: current num_updates is greater than the replacement
> 08:15:25 h02 cib: [10993]: WARN: cib_diff_notify: Update (client: crmd, 
> call:13834): -1.-1.-1 -> 0.952.21 (Update was older than existing 
> configuration)
> 08:15:25 h02 crmd: [10997]: WARN: finalize_sync_callback: Sync from h06 
> resulted in an error: Update was older than existing configuration
> 08:15:25 h02 crmd: [10997]: WARN: do_log: FSA: Input I_ELECTION_DC from 
> finalize_sync_callback() received in state S_FINALIZE_JOIN


Was there a cluster partition at this time?
Looks like one got further ahead than the other, but since we
regenerate the resource state after an election there is no harm here.

>
> 08:15:25 h06 crmd: [10847]: debug: crm_compare_age: Loose: 17 vs 268 (seconds)
> 08:15:25 h06 crmd: [10847]: debug: do_election_count_vote: Election 4 (owner: 
> h02) lost: vote from h02 (Uptime)
> 08:15:25 h06 crmd: [10847]: info: update_dc: Unset DC h02
> 08:23:02 h06 crmd: [10847]: debug: crm_compare_age: Loose: 18 vs 268 (seconds)
> 08:23:02 h06 crmd: [10847]: debug: do_election_count_vote: Election 5 (owner: 
> h02) lost: vote from h02 (Uptime)

The colon is important.  h06 lost the election because of the vote.
There was no "lost vote".

> 08:23:02 h06 crmd: [10847]: info: update_dc: Unset DC h02
> 08:23:03 h06 crmd: [10847]: debug: do_cl_join_finalize_respond: join-6: Join 
> complete. Sending local LRM status to h02
> 08:23:04 h06 crmd: [10847]: debug: get_xpath_object: No match for 
> //cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff
> 08:24:01 h06 crmd: [10847]: debug: get_xpath_object: No match for 
> //cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff
>
> Around at that time I also had this strange message:
> h02:~ # crm_resource -C -r prm_ocfs_fs_samba:0 -N h06
> Cleaning up prm_ocfs_fs_samba:0 on h06
> Waiting for 2 replies from the CRMd.
>
> No messages received in 60 seconds.. aborting
>
> Does anybody have an idea what could be wrong? I think the network was ok.
>
> Regards,
> Ulrich
>
>
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Q: "lost vote" while network seems up

Reply via email to