Hi Fabio,
Thank you for comments.
We confirmed this problem in the physical environment.
The communication of corosync lets eth1,eth2 go through.
-------------------------------------------------------
[root@bl460g6a ~]# ip addr show
(snip)
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether f4:ce:46:b3:fe:3c brd ff:ff:ff:ff:ff:ff
inet 192.168.101.9/24 brd 192.168.101.255 scope global eth1
inet6 fe80::f6ce:46ff:feb3:fe3c/64 scope link
valid_lft forever preferred_lft forever
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
link/ether 18:a9:05:78:6c:f0 brd ff:ff:ff:ff:ff:ff
inet 192.168.102.9/24 brd 192.168.102.255 scope global eth2
inet6 fe80::1aa9:5ff:fe78:6cf0/64 scope link
valid_lft forever preferred_lft forever
(snip)
8: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UNKNOWN
link/ether 52:54:00:7f:f3:0a brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 500
link/ether 52:54:00:7f:f3:0a brd ff:ff:ff:ff:ff:ff
-----------------------------------------------
I think that it is not a virtual environmental problem.
I attach the log that I confirmed just to make sure in three Blade.(RHEL6.4)
* I performed the interception of the communication with a network switch.
The phenomenon is similar, and, as for one node, a loop does an OPERATIONAL
state, and two other nodes do not change in an OPERATIONAL state.
After all is the problem same as the bug that you taught?
> Check this thread as reference:
> http://lists.linuxfoundation.org/pipermail/openais/2013-April/016792.html
Best Regards,
Hideo Yamauchi.
--- On Fri, 2013/5/31, Fabio M. Di Nitto <[email protected]> wrote:
> On 5/31/2013 7:12 AM, [email protected] wrote:
> > Hi All,
> >
> > We discovered the problem of the network of the corosync communication.
> >
> > We composed a cluster of three nodes on KVM in corosync.
> >
> > Step 1) Start corosync service in all nodes.
> >
> > Step 2) Confirm that a cluster is comprised of all nodes definitely and
> > became the OPERATIONAL state.
> >
> > Step 3) Cut off the network of node1(rh64-coro1) and node2(rh64-coro2) from
> > a host of KVM.
> >
> > [root@kvm-host ~]# brctl delif virbr3 vnet5;brctl delif virbr2 vnet1
> >
> > Step 4) Because a problem occurred, we stop all nodes.
> >
> >
> > The problem occurs at the time of step 3.
> >
> > One node(rh64-coro1) continues moving a state after becoming the
> > OPERATIONAL state.
> >
> > Two nodes(rh64-coro2 and rh64-coro3) continue changing in a state.
> > It seems to never change in an OPERATIONAL state while the first node
> > operates.
> >
> > This means that two nodes(rh64-coro2 and rh64-coro3) cannot complete
> > cluster constitution.
> > When this network trouble happens, by the setting that corosync combined
> > with Pacemaker, corosync cannot notify Pacemaker of the constitution change
> > of the cluster.
> >
> >
> > Question 1) Are there any parameters to solve this problem in corosync.conf?
> > * We bundle up an interface(Bonding) and think that it can be settled by
> >appointing "rrp_mode:none", but do not want to appoint "rrp_mode:none".
> >
> > Question 2) Is this a bug? Or is it specifications of the communication of
> > corosync?
>
> We already checked this specific test, and it appears to be a bug in
> the kernel bridge code when handling multicast traffic (groups are not
> joined correctly and traffic is not forwarded).
>
> Check this thread as reference:
> http://lists.linuxfoundation.org/pipermail/openais/2013-April/016792.html
>
> Thanks
> Fabio
>
>
> _______________________________________________
> discuss mailing list
> [email protected]
> http://lists.corosync.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list
[email protected]
http://lists.corosync.org/mailman/listinfo/discuss