Zoltan, Sorry it took a while to get back to you. I am just coming up to speed on OVS LACP implementation, so my understanding may not be correct. Please feel free to point them out If I am wrong.
According to wikipeida MC-LAG entry, there is no standard for it, they are mostly designed and implemented by vendors. After reading through the commit message, and comparing with the 802.1AX spec, I feel this seems like there is a bug in the MC-LAG implementation/configuration issue. When the partner on port A comes back again, should it wait for MC-LAG sync before using the default profile to exchange states with OVS? Andy On Mon, Jul 14, 2014 at 3:11 PM, Ben Pfaff <b...@nicira.com> wrote: > On Tue, Jul 08, 2014 at 05:35:57PM +0100, Zoltan Kiss wrote: >> This patch modifies the LACP selection logic by prefering a slaves with up >> and >> running partners when looking for a lead. >> That fixes the following scenario: >> - bond has 2 ports, A and B, their other ends are in separate chassis with >> MC-LAG sync >> - the partner of port A is restarted >> - port B is still working >> - the partner on port A comes back, but temporarily it is using a default >> config, as MC-LAG haven't synced yet >> - apparently that default config has a sys_priority which is smaller than the >> other, still running port, plus completely different sys_id >> - therefore OVS choose port A despite it won't ever comes up into >> collecting-distributing state >> - and port B is disabled, causing the whole bond goes down >> >> Checking through the 802.1ax standard, when port A comes up again, the two >> links fall apart due to the different LAG IDs. They should be attached to >> different Aggregators, and the Aggregators should live separately. In OVS >> there >> is no such concept as Aggregator, but I think it should be said that it has >> only >> one Aggregator, and it has an unique policy to choose which ports can join. >> Although changing the chassis' default config can also fix this, detecting >> such problems quite hard, therefore I think it is still valid to improve >> things >> in OVS side. >> Btw. the Linux kernel bonding drivers' LACP implementation allows more >> aggregators, and therefore it could handle this situation properly. >> >> Signed-off-by: Zoltan Kiss <zoltan.k...@citrix.com> > > I verified that the unit tests still pass with this applied. > > Andy Zhou said he'd review the patch. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev