2012/12/14 Muhammad Sharfuddin <[email protected]>

> node1(ailprd1) IP:192.168.7.11
> node2(ailprd2) IP:192.168.7.12
>
> Its a two node active/passive cluster, running perfectly since last two
> months, but yesterday both nodes were fenced(rebooted). Network
> connectivity b/w both nodes is perfect, and cluster is running fine
> again.
>
> Help me know the reason behind the following situation, and how can I
> avoid it happening next time:
>
> on node1(active node):
> Dec 13 12:31:06 ailprd1 corosync[7274]: [TOTEM ] A processor failed,
> forming new configuration.
> Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] New Configuration:
> Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0) ip(192.168.7.11)
> Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] Members Left:
> Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0) ip(192.168.7.12)
>
> on node2(passive node):
> Dec 13 12:31:05 ailprd2 corosync[7021]: [TOTEM ] A processor failed,
> forming new configuration.
> Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] CLM CONFIGURATION CHANGE
> Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] New Configuration:
> Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0) ip(192.168.7.12)
> Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] Members Left:
> Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0) ip(192.168.7.11)
>
> for node1(ailprd1) node2 left, likewise node2(ailprd2) thinks that node1
> left. then node2 tries to start the resources which were already running
> on node1, and both nodes were fenced.
>
> corosync.conf :
> totem {
>         rrp_mode:       none
>         join:   60
>         max_messages:   20
>         vsftype:        none
>         consensus:      6000
>         secauth:        off
> token_retransmits_before_loss_const:    10
>         token:  5000
>         version:        2
>
>         interface {
>                 bindnetaddr:    192.168.7.0
>                 mcastaddr:      224.0.0.116
>                 mcastport:      51234
>                 ringnumber:     0
>         }
> clear_node_high_bit:    yes
>
.../...

What's Corosync version ? 2.0 I guess
Maybe try on each node :
tcpdump -i eth0 -envv "port 51234"

to see if traffic can go thru.
What says ? :
corosync-objctl  | grep member (if in v.1)
corosync-cmapctl | grep member (if in v.2)
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to