Il 10/02/14 10:47, Jan Friesse ha scritto:
Alessandro, can you find message like "Corosync main process was not scheduled for ... ms" in log file (corosync must be at least 1.4.1-16 so CentOS 6.5)?
Hi
there is no a message like that in log file distro is centos 6.5 rpm -qa corosync corosync-1.4.1-17.el6.x86_64
Regards, Honza Alessandro Bono napsal(a):Hi after changing cluster from corosync to cman+corosync (switching from centos 6.3 to 6.4) I have a recurring problem with pacemaker/corosync pacemaker report this error pacemakerd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) and shutdown itself This normally happens when the host machine is under high load, at example during a full backup in addition, there are a lot of these messages Feb 01 23:27:04 corosync [TOTEM ] received message requesting test of ring now active Feb 01 23:27:04 corosync [TOTEM ] Automatically recovered ring 1 Feb 01 23:27:06 corosync [TOTEM ] Marking ringid 0 interface 10.12.32.1 FAULTY Feb 01 23:27:07 corosync [TOTEM ] received message requesting test of ring now active Feb 01 23:27:07 corosync [TOTEM ] Automatically recovered ring 0 Feb 01 23:27:07 corosync [TOTEM ] received message requesting test of ring now active Feb 01 23:27:07 corosync [TOTEM ] Automatically recovered ring 0 Feb 01 23:27:09 corosync [TOTEM ] Marking ringid 1 interface 10.12.23.1 FAULTY Feb 01 23:27:10 corosync [TOTEM ] received message requesting test of ring now active Feb 01 23:27:10 corosync [TOTEM ] Automatically recovered ring 1 Feb 01 23:27:10 corosync [TOTEM ] received message requesting test of ring now active Feb 01 23:27:10 corosync [TOTEM ] Automatically recovered ring 0 Feb 01 23:27:12 corosync [TOTEM ] Marking ringid 1 interface 10.12.23.1 FAULTY Feb 01 23:27:12 corosync [TOTEM ] Marking ringid 0 interface 10.12.32.1 FAULTY Feb 01 23:27:13 corosync [TOTEM ] received message requesting test of ring now active Feb 01 23:27:13 corosync [TOTEM ] received message requesting test of ring now active I reported this problem to pacemaker ml but they said it's a corosync problem same problem with centos 6.5 I tried to switch comunication to udpu and add another comunication path but without any luck cluster nodes are kvm virtual machine Is it a configuration problem? some info below, I can provide full log if necessary rpm -qa | egrep "pacem|coro"| sort corosync-1.4.1-17.el6.x86_64 corosynclib-1.4.1-17.el6.x86_64 drbd-pacemaker-8.3.16-1.el6.x86_64 pacemaker-1.1.10-14.el6_5.2.x86_64 pacemaker-cli-1.1.10-14.el6_5.2.x86_64 pacemaker-cluster-libs-1.1.10-14.el6_5.2.x86_64 pacemaker-debuginfo-1.1.10-1.el6.x86_64 pacemaker-libs-1.1.10-14.el6_5.2.x86_64 cat /etc/cluster/cluster.conf <cluster config_version="8" name="ga-ext_cluster"> <cman transport="udpu"/> <logging> <logging_daemon name="corosync" debug="on"/> </logging> <clusternodes> <clusternode name="ga1-ext" nodeid="1"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="ga1-ext"/> </method> </fence> <altname name="ga1-ext_alt"/> </clusternode> <clusternode name="ga2-ext" nodeid="2"> <fence> <method name="pcmk-redirect"> <device name="pcmk" port="ga2-ext"/> </method> </fence> <altname name="ga2-ext_alt"/> </clusternode> </clusternodes> <fencedevices> <fencedevice agent="fence_pcmk" name="pcmk"/> </fencedevices> </cluster> crm configure show node ga1-ext \ attributes standby="off" node ga2-ext \ attributes standby="off" primitive ClusterIP ocf:heartbeat:IPaddr \ params ip="10.12.23.3" cidr_netmask="24" \ op monitor interval="30s" primitive SharedFS ocf:heartbeat:Filesystem \ params device="/dev/drbd/by-res/r0" directory="/shared" fstype="ext4" options="noatime,nobarrier" primitive dovecot lsb:dovecot primitive drbd0 ocf:linbit:drbd \ params drbd_resource="r0" \ op monitor interval="15s" primitive drbdlinks ocf:tummy:drbdlinks primitive mail ocf:heartbeat:MailTo \ params email="[email protected]" subject="ga-ext cluster - " primitive mysql lsb:mysqld group service_group SharedFS drbdlinks ClusterIP mail mysql dovecot \ meta target-role="Started" ms ms_drbd0 drbd0 \ meta master-max="1" master-node-max="1" clone-max="2" clone-node-max="1" notify="true" colocation service_on_drbd inf: service_group ms_drbd0:Master order service_after_drbd inf: ms_drbd0:promote service_group:start property $id="cib-bootstrap-options" \ dc-version="1.1.10-14.el6_5.2-368c726" \ cluster-infrastructure="cman" \ expected-quorum-votes="2" \ stonith-enabled="false" \ no-quorum-policy="ignore" \ last-lrm-refresh="1391290945" \ maintenance-mode="false" rsc_defaults $id="rsc-options" \ resource-stickiness="100" extract from cluster.log Feb 01 21:40:15 corosync [MAIN ] Completed service synchronization, ready to provide service. Feb 01 21:40:15 corosync [TOTEM ] waiting_trans_ack changed to 0 Feb 01 21:40:15 corosync [TOTEM ] Marking ringid 1 interface 10.12.23.1 FAULTY Feb 01 21:40:15 [13253] ga1-ext cib: info: crm_cs_flush: Sent 4 CPG messages (0 remaining, last=48): OK (1) Feb 01 21:40:15 [13256] ga1-ext crmd: info: crm_cs_flush: Sent 3 CPG messages (0 remaining, last=24): OK (1) Feb 01 21:40:16 corosync [TOTEM ] received message requesting test of ring now active Feb 01 21:40:16 corosync [TOTEM ] received message requesting test of ring now active Feb 01 21:40:16 corosync [TOTEM ] received message requesting test of ring now active Feb 01 21:40:16 corosync [TOTEM ] Automatically recovered ring 0 Feb 01 21:40:16 corosync [TOTEM ] Automatically recovered ring 1 Feb 01 21:40:16 corosync [TOTEM ] Automatically recovered ring 1 Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_diff: Diff 0.299.3 -> 0.299.4 from ga2-ext not applied to 0.299.11: current "num_updates" is greater than required Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='f ail-count-drbd0']: No such device or address (rc=-6, origin=local/attrd/34, version=0.299.11) Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='l ast-failure-mysql']: No such device or address (rc=-6, origin=local/attrd/35, version=0.299.11) Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='l ast-failure-drbd0']: No such device or address (rc=-6, origin=local/attrd/36, version=0.299.11) Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_diff: Diff 0.299.4 -> 0.299.5 from ga2-ext not applied to 0.299.11: current "num_updates" is greater than required Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_diff: Diff 0.299.5 -> 0.299.6 from ga2-ext not applied to 0.299.11: current "num_updates" is greater than required Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_diff: Diff 0.299.6 -> 0.299.7 from ga2-ext not applied to 0.299.11: current "num_updates" is greater than required Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_diff: Diff 0.299.7 -> 0.299.8 from ga2-ext not applied to 0.299.11: current "num_updates" is greater than required Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_diff: Diff 0.299.8 -> 0.299.9 from ga2-ext not applied to 0.299.11: current "num_updates" is greater than required Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='m aster-drbd0']: OK (rc=0, origin=local/attrd/37, version=0.299.11) Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/attrd/38, version=0.299.11) Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='l ast-failure-ClusterIP']: No such device or address (rc=-6, origin=local/attrd/39, version=0.299.11) Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='p robe_complete']: OK (rc=0, origin=local/attrd/40, version=0.299.11) Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/attrd/41, version=0.299.11) Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_query operation for section //cib/status//node_state[@id='ga1-ext']//transient_attributes//nvpair[@name='m aster-drbd0']: OK (rc=0, origin=local/attrd/42, version=0.299.11) Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_request: Completed cib_modify operation for section status: OK (rc=0, origin=local/attrd/43, version=0.299.11) Feb 01 21:40:17 [13256] ga1-ext crmd: info: register_fsa_error_adv: Resetting the current action list Feb 01 21:40:17 [13256] ga1-ext crmd: warning: crmd_ha_msg_filter: Another DC detected: ga2-ext (op=noop) Feb 01 21:40:17 [13256] ga1-ext crmd: info: register_fsa_error_adv: Resetting the current action list Feb 01 21:40:17 [13256] ga1-ext crmd: warning: crmd_ha_msg_filter: Another DC detected: ga2-ext (op=noop) Feb 01 21:40:17 corosync [CMAN ] ais: deliver_fn source nodeid = 2, len=24, endian_conv=0 Feb 01 21:40:17 corosync [CMAN ] memb: Message on port 0 is 6 Feb 01 21:40:17 corosync [CMAN ] memb: got KILL for node 1 Feb 01 21:40:17 [13256] ga1-ext crmd: info: register_fsa_error_adv: Resetting the current action list Feb 01 21:40:17 [13256] ga1-ext crmd: warning: crmd_ha_msg_filter: Another DC detected: ga2-ext (op=noop) Feb 01 21:40:17 [13256] ga1-ext crmd: info: register_fsa_error_adv: Resetting the current action list Feb 01 21:40:17 [13256] ga1-ext crmd: warning: crmd_ha_msg_filter: Another DC detected: ga2-ext (op=join_offer) Feb 01 21:40:17 [13256] ga1-ext crmd: info: do_state_transition: State transition S_INTEGRATION -> S_ELECTION [ input=I_ELECTION cause=C_FSA_INTERNAL origin=crmd_ha_msg_filter ] Feb 01 21:40:17 [13256] ga1-ext crmd: info: update_dc: Unset DC. Was ga1-ext Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_diff: Diff 0.299.9 -> 0.299.10 from ga2-ext not applied to 0.299.11: current "num_updates" is greater than required Feb 01 21:40:17 [13253] ga1-ext cib: info: cib_process_diff: Diff 0.299.10 -> 0.299.11 from ga2-ext not applied to 0.299.11: current "num_updates" is greater than required Feb 01 21:40:18 [13247] ga1-ext pacemakerd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Feb 01 21:40:18 [13247] ga1-ext pacemakerd: error: mcp_cpg_destroy: Connection destroyed Feb 01 21:40:18 [13247] ga1-ext pacemakerd: info: crm_xml_cleanup: Cleaning up memory from libxml2 Feb 01 21:40:18 [13255] ga1-ext attrd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Feb 01 21:40:18 [13255] ga1-ext attrd: crit: attrd_cs_destroy: Lost connection to Corosync service! Feb 01 21:40:18 [13255] ga1-ext attrd: notice: main: Exiting... Feb 01 21:40:18 [13255] ga1-ext attrd: notice: main: Disconnecting client 0x238ff10, pid=13256... Feb 01 21:40:18 [13255] ga1-ext attrd: error: attrd_cib_connection_destroy: Connection to the CIB terminated... Feb 01 21:40:18 [13254] ga1-ext stonith-ng: info: stonith_shutdown: Terminating with 1 clients Feb 01 21:40:18 [13254] ga1-ext stonith-ng: info: cib_connection_destroy: Connection to the CIB closed. Feb 01 21:40:18 [13254] ga1-ext stonith-ng: info: crm_client_destroy: Destroying 0 events Feb 01 21:40:18 [13254] ga1-ext stonith-ng: info: qb_ipcs_us_withdraw: withdrawing server sockets Feb 01 21:40:18 [13254] ga1-ext stonith-ng: info: main: Done Feb 01 21:40:18 [13254] ga1-ext stonith-ng: info: crm_xml_cleanup: Cleaning up memory from libxml2 Feb 01 21:40:18 [13256] ga1-ext crmd: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Feb 01 21:40:18 [13256] ga1-ext crmd: error: crmd_cs_destroy: connection terminated Feb 01 21:40:18 [13256] ga1-ext crmd: info: qb_ipcs_us_withdraw: withdrawing server sockets Feb 01 21:40:18 [13253] ga1-ext cib: error: pcmk_cpg_dispatch: Connection to the CPG API failed: Library error (2) Feb 01 21:40:18 [13253] ga1-ext cib: error: cib_cs_destroy: Corosync connection lost! Exiting. Feb 01 21:40:18 [13253] ga1-ext cib: info: terminate_cib: cib_cs_destroy: Exiting fast... Feb 01 21:40:18 [13253] ga1-ext cib: info: qb_ipcs_us_withdraw: withdrawing server sockets Feb 01 21:40:18 [13253] ga1-ext cib: info: crm_client_destroy: Destroying 0 events Feb 01 21:40:18 [13253] ga1-ext cib: info: crm_client_destroy: Destroying 0 events Feb 01 21:40:18 [13253] ga1-ext cib: info: qb_ipcs_us_withdraw: withdrawing server sockets Feb 01 21:40:18 [13253] ga1-ext cib: info: crm_client_destroy: Destroying 0 events Feb 01 21:40:18 [13253] ga1-ext cib: info: qb_ipcs_us_withdraw: withdrawing server sockets Feb 01 21:40:18 [13253] ga1-ext cib: info: crm_xml_cleanup: Cleaning up memory from libxml2 Feb 01 21:40:18 [13256] ga1-ext crmd: info: tengine_stonith_connection_destroy: Fencing daemon disconnected Feb 01 21:40:18 [13256] ga1-ext crmd: notice: crmd_exit: Forcing immediate exit: Link has been severed (67) Feb 01 21:40:18 [13256] ga1-ext crmd: info: crm_xml_cleanup: Cleaning up memory from libxml2 Feb 01 21:40:18 [25258] ga1-ext lrmd: info: cancel_recurring_action: Cancelling operation ClusterIP_monitor_30000 Feb 01 21:40:18 [25258] ga1-ext lrmd: warning: qb_ipcs_event_sendv: new_event_notification (25258-13256-6): Bad file descriptor (9) Feb 01 21:40:18 [25258] ga1-ext lrmd: warning: send_client_notify: Notification of client crmd/0b3ea733-7340-439c-9f46-81b0d7e1f6a1 failed Feb 01 21:40:18 [25258] ga1-ext lrmd: info: crm_client_destroy: Destroying 1 events Feb 01 21:40:18 [25260] ga1-ext pengine: info: crm_client_destroy: Destroying 0 events
-- Cordiali saluti Alessandro Bono _______________________________________________ discuss mailing list [email protected] http://lists.corosync.org/mailman/listinfo/discuss
