On Fri, May 24, 2019 at 2:17 PM Jay Vosburgh <jay.vosbu...@canonical.com> wrote: > > Jarod Wilson <ja...@redhat.com> wrote: > > >Once in a while, with just the right timing, 802.3ad slaves will fail to > >properly initialize, winding up in a weird state, with a partner system > >mac address of 00:00:00:00:00:00. This started happening after a fix to > >properly track link_failure_count tracking, where an 802.3ad slave that > >reported itself as link up in the miimon code, but wasn't able to get a > >valid speed/duplex, started getting set to BOND_LINK_FAIL instead of > >BOND_LINK_DOWN. That was the proper thing to do for the general "my link > >went down" case, but has created a link initialization race that can put > >the interface in this odd state. > Are there any notification consequences because of this change?
> Reading back in the git history, the ultimate cause of this > "weird state" appears to be devices that assert NETDEV_UP prior to > actually being able to supply sane speed/duplex values, correct? > > Presuming that this is the case, I don't see that there's much > else to be done here, and so: > > Acked-by: Jay Vosburgh <jay.vosbu...@canonical.com> > > >The simple fix is to instead set the slave link to BOND_LINK_DOWN again, > >if the link has never been up (last_link_up == 0), so the link state > >doesn't bounce from BOND_LINK_DOWN to BOND_LINK_FAIL -- it hasn't failed > >in this case, it simply hasn't been up yet, and this prevents the > >unnecessary state change from DOWN to FAIL and getting stuck in an init > >failure w/o a partner mac. > > > >Fixes: ea53abfab960 ("bonding/802.3ad: fix link_failure_count tracking") > >CC: Jay Vosburgh <j.vosbu...@gmail.com> > >CC: Veaceslav Falico <vfal...@gmail.com> > >CC: Andy Gospodarek <a...@greyhouse.net> > >CC: "David S. Miller" <da...@davemloft.net> > >CC: net...@vger.kernel.org > >Tested-by: Heesoon Kim <heesoon....@stratus.com> > >Signed-off-by: Jarod Wilson <ja...@redhat.com> > > > > >--- > > drivers/net/bonding/bond_main.c | 15 ++++++++++----- > > 1 file changed, 10 insertions(+), 5 deletions(-) > > > >diff --git a/drivers/net/bonding/bond_main.c > >b/drivers/net/bonding/bond_main.c > >index 062fa7e3af4c..407f4095a37a 100644 > >--- a/drivers/net/bonding/bond_main.c > >+++ b/drivers/net/bonding/bond_main.c > >@@ -3122,13 +3122,18 @@ static int bond_slave_netdev_event(unsigned long > >event, > > case NETDEV_CHANGE: > > /* For 802.3ad mode only: > > * Getting invalid Speed/Duplex values here will put slave > >- * in weird state. So mark it as link-fail for the time > >- * being and let link-monitoring (miimon) set it right when > >- * correct speeds/duplex are available. > >+ * in weird state. Mark it as link-fail if the link was > >+ * previously up or link-down if it hasn't yet come up, and > >+ * let link-monitoring (miimon) set it right when correct > >+ * speeds/duplex are available. > > */ > > if (bond_update_speed_duplex(slave) && > >- BOND_MODE(bond) == BOND_MODE_8023AD) > >- slave->link = BOND_LINK_FAIL; > >+ BOND_MODE(bond) == BOND_MODE_8023AD) { > >+ if (slave->last_link_up) > >+ slave->link = BOND_LINK_FAIL; > >+ else > >+ slave->link = BOND_LINK_DOWN; > >+ } > > > > if (BOND_MODE(bond) == BOND_MODE_8023AD) > > bond_3ad_adapter_speed_duplex_changed(slave); > >-- > >2.20.1 > >