On 10-08-17 07:39, Kalle Valo wrote:
Hi Mahesh and Andy,

James Feeney reported that there's a serious regression in bonding
module since v4.12, it doesn't work with wireless drivers anymore as
wireless drivers don't report the link speed via ethtool:

https://bugzilla.kernel.org/show_bug.cgi?id=196547

In the bug report it's said that this commit is the culprit:

3f3c278c94dd bonding: fix active-backup transition

This commit references another one. ie. commit c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state"). Before this commit the result of __ethtool_get_link_ksettings() was simply ignored.

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -365,9 +365,10 @@ int bond_set_carrier(struct bonding *bond)
 /* Get link speed and duplex from the slave's base driver
  * using ethtool. If for some reason the call fails or the
  * values are invalid, set speed and duplex to -1,
- * and return.
+ * and return. Return 1 if speed or duplex settings are
+ * UNKNOWN; 0 otherwise.
  */
-static void bond_update_speed_duplex(struct slave *slave)
+static int bond_update_speed_duplex(struct slave *slave)
 {
        struct net_device *slave_dev = slave->dev;
        struct ethtool_link_ksettings ecmd;
@@ -377,24 +378,27 @@ static void bond_update_speed_duplex(struct slave *slave)
        slave->duplex = DUPLEX_UNKNOWN;

        res = __ethtool_get_link_ksettings(slave_dev, &ecmd);
-       if (res < 0)
-               return;
-
-       if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1))
-               return;
-
+       if (res < 0) {
+               slave->link = BOND_LINK_DOWN;
+               return 1;
+       }
+       if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) {
+               slave->link = BOND_LINK_DOWN;
+               return 1;
+       }

Commit 3f3c278c94dd ("bonding: fix active-backup transition") moves setting the link state to the call sites of bond_update_speed_duplex(), just not all call sites.

Is there a fix for this or should that commit be reverted? This seems to
be a serious regression as there are multiple reports already and we
should get it fixed for v4.13, and the fix backported to v4.12 stable
release.

The ethtool callbacks really seem optional. At least in brcmfmac, the wireless driver I maintain, I only provide get_drvinfo callback and there is no warning triggered upon registering the netdev. The changes above now require each netdev to implement the get_link_ksettings callback (get_settings is deprecated) or the link is marked as DOWN ruling it out to be used as active bond slave. To the end-users who were using bonding this is simply a regression. So to fix that both changes should be reverted in my opinion.

Now specifically for wireless interfaces we could implement get_link_ksettings callback although most of the fields requested are meaningless in wireless context. Regarding the speed and half-duplex values we raised some concerns in an earlier discussion with James. Wireless is always half-duplex as there can be only one (unintended ref to [1]). If the reported speed in wifi is difficult. In wifi we have txrate and rxrate which are inherently asynchronous and it is a per-packet value so it is going to change a lot. Seeing only 4 call sites in the bonding code tells me that is not taken into account. All in all this shenanigan seems netconf material to me.

Regards,
Arend

[1] https://en.wikipedia.org/wiki/Highlander_(film)

Reply via email to