node1(ailprd1) IP:192.168.7.11
node2(ailprd2) IP:192.168.7.12
Its a two node active/passive cluster, running perfectly since last two
months, but yesterday both nodes were fenced(rebooted). Network
connectivity b/w both nodes is perfect, and cluster is running fine
again.
Help me know the reason behind the following situation, and how can I
avoid it happening next time:
on node1(active node):
Dec 13 12:31:06 ailprd1 corosync[7274]: [TOTEM ] A processor failed,
forming new configuration.
Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] CLM CONFIGURATION CHANGE
Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] New Configuration:
Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0) ip(192.168.7.11)
Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] Members Left:
Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0) ip(192.168.7.12)
on node2(passive node):
Dec 13 12:31:05 ailprd2 corosync[7021]: [TOTEM ] A processor failed,
forming new configuration.
Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] CLM CONFIGURATION CHANGE
Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] New Configuration:
Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0) ip(192.168.7.12)
Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] Members Left:
Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0) ip(192.168.7.11)
for node1(ailprd1) node2 left, likewise node2(ailprd2) thinks that node1
left. then node2 tries to start the resources which were already running
on node1, and both nodes were fenced.
corosync.conf :
totem {
rrp_mode: none
join: 60
max_messages: 20
vsftype: none
consensus: 6000
secauth: off
token_retransmits_before_loss_const: 10
token: 5000
version: 2
interface {
bindnetaddr: 192.168.7.0
mcastaddr: 224.0.0.116
mcastport: 51234
ringnumber: 0
}
clear_node_high_bit: yes
}
logging {
to_logfile: no
to_syslog: yes
debug: off
timestamp: off
to_stderr: no
fileline: off
syslog_facility: daemon
}
Regards,
Muhammad Sharfuddin
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems