On Thu, Sep 29, 2011 at 6:09 PM, Ulrich Windl <[email protected]> wrote: > Hello! > > I'm examining a case where both nodes of a two node cluster were fenced at > the same time. The cluster is running SLES11 SP1 with a corosync 1.4.1 Update > to make the rrp stable. I found strange messages: > > 08:15:25 h02 cib: [10993]: WARN: cib_process_replace: Replacement 0.952.21 > not applied to 0.952.23: current num_updates is greater than the replacement > 08:15:25 h02 cib: [10993]: WARN: cib_diff_notify: Update (client: crmd, > call:13834): -1.-1.-1 -> 0.952.21 (Update was older than existing > configuration) > 08:15:25 h02 crmd: [10997]: WARN: finalize_sync_callback: Sync from h06 > resulted in an error: Update was older than existing configuration > 08:15:25 h02 crmd: [10997]: WARN: do_log: FSA: Input I_ELECTION_DC from > finalize_sync_callback() received in state S_FINALIZE_JOIN
Was there a cluster partition at this time? Looks like one got further ahead than the other, but since we regenerate the resource state after an election there is no harm here. > > 08:15:25 h06 crmd: [10847]: debug: crm_compare_age: Loose: 17 vs 268 (seconds) > 08:15:25 h06 crmd: [10847]: debug: do_election_count_vote: Election 4 (owner: > h02) lost: vote from h02 (Uptime) > 08:15:25 h06 crmd: [10847]: info: update_dc: Unset DC h02 > 08:23:02 h06 crmd: [10847]: debug: crm_compare_age: Loose: 18 vs 268 (seconds) > 08:23:02 h06 crmd: [10847]: debug: do_election_count_vote: Election 5 (owner: > h02) lost: vote from h02 (Uptime) The colon is important. h06 lost the election because of the vote. There was no "lost vote". > 08:23:02 h06 crmd: [10847]: info: update_dc: Unset DC h02 > 08:23:03 h06 crmd: [10847]: debug: do_cl_join_finalize_respond: join-6: Join > complete. Sending local LRM status to h02 > 08:23:04 h06 crmd: [10847]: debug: get_xpath_object: No match for > //cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff > 08:24:01 h06 crmd: [10847]: debug: get_xpath_object: No match for > //cib_update_result//diff-added//crm_config in /notify/cib_update_result/diff > > Around at that time I also had this strange message: > h02:~ # crm_resource -C -r prm_ocfs_fs_samba:0 -N h06 > Cleaning up prm_ocfs_fs_samba:0 on h06 > Waiting for 2 replies from the CRMd. > > No messages received in 60 seconds.. aborting > > Does anybody have an idea what could be wrong? I think the network was ok. > > Regards, > Ulrich > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
