Re: [Linux-HA] why nodes cant see each other ?

Muhammad Sharfuddin Fri, 14 Dec 2012 02:26:31 -0800

ailprd1:~/Desktop # corosync-objctl |grep member
runtime.totem.pg.mrp.srp.members.185051328.ip=r(0) ip(192.168.7.11)
runtime.totem.pg.mrp.srp.members.185051328.join_count=1
runtime.totem.pg.mrp.srp.members.185051328.status=joined
runtime.totem.pg.mrp.srp.members.201828544.ip=r(0) ip(192.168.7.12) 
runtime.totem.pg.mrp.srp.members.201828544.join_count=1
runtime.totem.pg.mrp.srp.members.201828544.status=joined


also 

ailprd1:~/Desktop # tcpdump -i bond0 -envv "port 51234"
tcpdump: listening on bond0, link-type EN10MB (Ethernet), capture size
96 bytes
15:07:33.117378 00:10:18:9a:1e:7c > 01:00:5e:00:00:74, ethertype IPv4
(0x0800), length 124: (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto
UDP (17), length 110) 192.168.7.11.51233 > 224.0.0.116.51234: UDP,
length 82
15:07:33.299420 00:10:18:9a:1e:7c > 00:10:18:9a:21:c8, ethertype IPv4
(0x0800), length 112: (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
proto UDP (17), length 98) 192.168.7.11.51233 > 192.168.7.12.51234: UDP,
length 70
15:07:33.299501 00:10:18:9a:21:c8 > 00:10:18:9a:1e:7c, ethertype IPv4
(0x0800), length 112: (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
proto UDP (17), length 98) 192.168.7.12.51233 > 192.168.7.11.51234: UDP,
length 70
15:07:33.508558 00:10:18:9a:1e:7c > 01:00:5e:00:00:74, ethertype IPv4
(0x0800), length 124: (tos 0x0, ttl 1, id 0, offset 0, flags [DF], proto
UDP (17), length 110) 192.168.7.11.51233 > 224.0.0.116.51234: UDP,
length 82
15:07:33.690607 00:10:18:9a:1e:7c > 00:10:18:9a:21:c8, ethertype IPv4
(0x0800), length 112: (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
proto UDP (17), length 98) 192.168.7.11.51233 > 192.168.7.12.51234: UDP,
length 70
.
.
.
15:07:56.768994 00:10:18:9a:21:c8 > 00:10:18:9a:1e:7c, ethertype IPv4
(0x0800), length 112: (tos 0x0, ttl 64, id 0, offset 0, flags [DF],
proto UDP (17), length 98) 192.168.7.12.51233 > 192.168.7.11.51234: UDP,
length 70
^C
183 packets captured
183 packets received by filter
0 packets dropped by kernel

--
Regards,

Muhammad Sharfuddin

On Fri, 2012-12-14 at 08:39 +0100, Emmanuel Saint-Joanis wrote:
> 2012/12/14 Muhammad Sharfuddin <[email protected]>
>         node1(ailprd1) IP:192.168.7.11
>         node2(ailprd2) IP:192.168.7.12
>         
>         Its a two node active/passive cluster, running perfectly since
>         last two
>         months, but yesterday both nodes were fenced(rebooted).
>         Network
>         connectivity b/w both nodes is perfect, and cluster is running
>         fine
>         again.
>         
>         Help me know the reason behind the following situation, and
>         how can I
>         avoid it happening next time:
>         
>         on node1(active node):
>         Dec 13 12:31:06 ailprd1 corosync[7274]: [TOTEM ] A processor
>         failed,
>         forming new configuration.
>         Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] CLM
>         CONFIGURATION CHANGE
>         Dec 13 12:31:12 ailprd1 corosync[7274]: [CLM ] New
>         Configuration:
>         Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0)
>         ip(192.168.7.11)
>         Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] Members Left:
>         Dec 13 12:31:13 ailprd1 corosync[7274]: [CLM ] r(0)
>         ip(192.168.7.12)
>         
>         on node2(passive node):
>         Dec 13 12:31:05 ailprd2 corosync[7021]: [TOTEM ] A processor
>         failed,
>         forming new configuration.
>         Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] CLM
>         CONFIGURATION CHANGE
>         Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] New
>         Configuration:
>         Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0)
>         ip(192.168.7.12)
>         Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] Members Left:
>         Dec 13 12:31:11 ailprd2 corosync[7021]: [CLM ] r(0)
>         ip(192.168.7.11)
>         
>         for node1(ailprd1) node2 left, likewise node2(ailprd2) thinks
>         that node1
>         left. then node2 tries to start the resources which were
>         already running
>         on node1, and both nodes were fenced.
>         
>         corosync.conf :
>         totem {
>                 rrp_mode:       none
>                 join:   60
>                 max_messages:   20
>                 vsftype:        none
>                 consensus:      6000
>                 secauth:        off
>         token_retransmits_before_loss_const:    10
>                 token:  5000
>                 version:        2
>         
>                 interface {
>                         bindnetaddr:    192.168.7.0
>                         mcastaddr:      224.0.0.116
>                         mcastport:      51234
>                         ringnumber:     0
>                 }
>         clear_node_high_bit:    yes
> .../... 
> 
> 
> What's Corosync version ? 2.0 I guess
> Maybe try on each node :
> tcpdump -i eth0 -envv "port 51234"
> 
> 
> to see if traffic can go thru.
> What says ? :
> corosync-objctl  | grep member (if in v.1)
> corosync-cmapctl | grep member (if in v.2)
> 
> 

_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] why nodes cant see each other ?

Reply via email to