2012/12/14 Muhammad Sharfuddin <[email protected]> > node1(ailprd1) IP:192.168.7.11 > node2(ailprd2) IP:192.168.7.12 > > Its a two node active/passive cluster, running perfectly since last two > months, but yesterday both nodes were fenced(rebooted). Network > connectivity b/w both nodes is perfect, and cluster is running fine > again. > > Help me know the reason behind the following situation, and how can I > avoid it happening next time: > > on node1(active node): > Dec 13 12:31:06 ailprd1 corosync[7274]: [TOTEM ] A processor failed, > forming new configuration. > Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] CLM CONFIGURATION CHANGE > Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] New Configuration: > Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0) ip(192.168.7.11) > Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] Members Left: > Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0) ip(192.168.7.12) > > on node2(passive node): > Dec 13 12:31:05 ailprd2 corosync[7021]: [TOTEM ] A processor failed, > forming new configuration. > Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] CLM CONFIGURATION CHANGE > Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] New Configuration: > Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0) ip(192.168.7.12) > Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] Members Left: > Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0) ip(192.168.7.11) > > for node1(ailprd1) node2 left, likewise node2(ailprd2) thinks that node1 > left. then node2 tries to start the resources which were already running > on node1, and both nodes were fenced. > > corosync.conf : > totem { > rrp_mode: none > join: 60 > max_messages: 20 > vsftype: none > consensus: 6000 > secauth: off > token_retransmits_before_loss_const: 10 > token: 5000 > version: 2 > > interface { > bindnetaddr: 192.168.7.0 > mcastaddr: 224.0.0.116 > mcastport: 51234 > ringnumber: 0 > } > clear_node_high_bit: yes > .../...
What's Corosync version ? 2.0 I guess Maybe try on each node : tcpdump -i eth0 -envv "port 51234" to see if traffic can go thru. What says ? : corosync-objctl | grep member (if in v.1) corosync-cmapctl | grep member (if in v.2) _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
