>-----Original Message----- >From: Jay Vosburgh [mailto:jay.vosbu...@canonical.com] >Sent: Monday, January 25, 2016 10:01 PM >To: zhuyj >Cc: mkube...@suse.cz; vfal...@gmail.com; go...@cumulusnetworks.com; >netdev@vger.kernel.org; Shteinbock, Boris (Wind River); Tantilov, Emil S >Subject: Re: [PATCH 1/1] bonding: Use notifiers for slave link state >detection > >zhuyj <zyjzyj2...@gmail.com> wrote: > >>On 01/26/2016 08:43 AM, Jay Vosburgh wrote: >>> <zyjzyj2...@gmail.com> wrote: >>> >>>> From: Zhu Yanjun <zyjzyj2...@gmail.com> >>>> >>>> Bonding will utilize notifier callbacks to detect slave >>>> link state changes. It is intended to be used with miimon >>>> set to zero, and does not support the updelay or downdelay >>>> options to bonding. >>>> >>>> Because of link flap from the slave interface, if the notifier >>>> is NETDEV_UP while the actual link state is down, it is not >>>> necessary to continue. >>>> >>>> Signed-off-by: Jay Vosburgh <jay.vosbu...@canonical.com> >>> I haven't signed off on this patch. >>> >>> I've just started some testing, but as before immediately get an >>> RCU warning; it looks to be coming from bond_miimon_inspect_slave(); >>> >>> [ 316.473050] bond1: Enslaving eth1 as a backup interface with an up >link >>> [ 316.473059] >>> [ 316.473806] =============================== >>> [ 316.475630] [ INFO: suspicious RCU usage. ] >>> [ 316.477519] 4.4.0+ #38 Not tainted >>> [ 316.479094] ------------------------------- >>> [ 316.480765] drivers/net/bonding/bond_main.c:2024 suspicious >rcu_dereference_check() usage! >>> >>> This is presumably because the "case NETDEV_DOWN" call to >>> bond_miimon_inspect_slave does not hold RCU. It does hold RTNL, though, >>> which should be safe for this usage (RTNL mutexes changes to the active >>> slave). The appended patch on top of the original makes the warning go >>> away. >>> >>> I'm still testing the patch and have no comment about its >>> functionality as yet. >>> >>> diff --git a/drivers/net/bonding/bond_main.c >b/drivers/net/bonding/bond_main.c >>> index 9f67948..e3faee9 100644 >>> --- a/drivers/net/bonding/bond_main.c >>> +++ b/drivers/net/bonding/bond_main.c >>> @@ -2014,14 +2014,14 @@ static int bond_slave_info_query(struct >net_device *bond_dev, struct ifslave *in >>> /*-------------------------------- Monitoring >>> -------------------------------*/ >>> -/* called with rcu_read_lock() */ >>> +/* called with rcu_read_lock() or RTNL */ >>> static int bond_miimon_inspect_slave(struct bonding *bond, struct >slave *slave, >>> unsigned long event) >>> { >>> int link_state; >>> bool ignore_updelay; >>> - ignore_updelay = !rcu_dereference(bond->curr_active_slave); >>> + ignore_updelay = !rcu_dereference_rtnl(bond->curr_active_slave); >> >>Thanks a lot. >>Because kernel v4.4 needs this kind of patch, I backport this patch from >>net-next to kernel v4.4. >> >>If it is not appropriate, I will revert this patch. > > I don't understand what you mean here. > > I've tested the patch (with my above modification), and while I >seem to be hitting an unrelated bug in the ARP monitor, I believe this >patch will misbehave when the ARP monitor is running. > > For example, if arp_interval=1000 and miimon=0, the link state >notifier callback will change a slave to up should a notifier event take >place. So, hypothetically, if a slave is "down" according to the ARP >monitor (but actually carrier up), and then experience a carrier down >then up transition, the slave would be set to "up" even though the ARP >monitor believes it to be down. > > I'm not able to induce the speedy link flap events, so I'm not >sure about this portion of the patch: > >+ /* Because of link flap from the slave interface, it is possilbe that >+ * the notifiler is NETDEV_UP while the actual link state is down. If >+ * so, it is not necessary to contiune. >+ */ >+ switch (event) { >+ case NETDEV_UP: >+ if (!link_state) >+ return 0; >+ break; >+ >+ case NETDEV_DOWN: >+ if (link_state) >+ return 0; >+ break; >+ } >+ > > Unless I misunderstood, Emil's comments elsewhere suggest that >the current ixgbe driver won't cause those, though.
I ran tests with the above checks and I can't get them to trigger either way. So at least in my setup this patch has no effect. Thanks, Emil