On 07/24/2017 12:32 PM, Florian Fainelli wrote: > On 07/24/2017 12:13 PM, Mason wrote: >> On 24/07/2017 18:49, Florian Fainelli wrote: >> >>> On 07/24/2017 08:01 AM, Mason wrote: >>> >>>>> When I set the link down via 'ip link set eth0 down' >>>>> (as opposed to pulling the Ethernet cable) things don't happen as >>>>> expected: >>>>> >>>>> The driver's adjust_link() callback is never called, and doesn't >>>>> get a chance make some required changes. And when I set the link >>>>> up again, there is no network connectivity. >>>>> >>>>> I get this problem only if I enable interrupts on my PHY. >>>>> If I use polling, things work as expected. >>>>> >>>>> >>>>> When I set the link down, devinet_ioctl() eventually calls >>>>> ndo_set_rx_mode() and ndo_stop() >>>>> >>>>> In ndo_stop() the driver calls >>>>> phy_stop(phydev); >>>>> which disables interrupts and sets the state to HALTED. >>>>> >>>>> In phy_state_machine() >>>>> the PHY_HALTED case does call the adjust_link() callback: >>>>> >>>>> if (phydev->link) { >>>>> phydev->link = 0; >>>>> netif_carrier_off(phydev->attached_dev); >>>>> phy_adjust_link(phydev); >>>>> do_suspend = true; >>>>> } >>>>> >>>>> But it's not called when I use interrupts... >>>>> >>>>> Perhaps because there are no interrupts generated? >>>>> Or even if there were, they have been turned off by phy_stop? >>>>> >>>>> Basically, it seems like when I use interrupts, >>>>> the phy_state_machine() is not called on link down, >>>>> which breaks the MAC driver's expectations. >>>>> >>>>> Am I barking up the wrong tree? >>>> >>>> FWIW, the patch below solves my issue. >>>> Basically, we reset the MAC in open(), instead of probe(). >>>> >>>> I also had to solve the issue of adjust_link() not being >>>> called by calling it explicitly in stop() instead of >>>> relying on phy_stop() to do it indirectly. >>> >>> Which is of course absolutely not how it is intended to be used. >>> phy_stop() does the following: >>> >>> - if the PHY was already HALTED do nothing and exit >>> - if it was not and an interrupt is valid for this PHY: disable and >>> clear these interrupts >>> - set state to PHY_HALTED >>> >>> somehow an interrupt should be generated from doing this such that >>> phy_change(), invoked from phy_interrupt() should have a chance to run >>> and make the PHY state machine transition properly to PHY_HALTED. >> >> I'm totally confused. Are you saying that phy_stop itself >> should trigger an interrupt, or that the process of setting >> the link down should generate an interrupt *before* we reach >> phy_stop? > > My reading of the code, and because I don't actually have a system where > PHY interrupts proper are used (only polling or PHY_IGNORE INTERRUPT) is > that, yes, somehow calling phy_stop() should result in a PHY interrupt > to be generated making the state machine move to PHY_HALTED. > >> >> I'm also perplex over this synchronous IRQ business. >> Should I be looking for a way to trigger an IRQ in >> software in the Atheros PHY? > > No, first understand the problem and what is going on before trying to > workaround things in the PHY driver, there were questions for you as to > what state the PHY state machine is left in we need to see that to > understand how to possibly fix what you are seeing. > >> >> Before I forget: there is also an issue when using the PHY >> in polling mode. The ndo_stop callback runs through phy_stop >> and phy_disconnect too fast for the adjust_link() callback >> to be called. My patch fixed that too, by calling >> nb8800_link_reconfigure() explicitly. > > Most, if not all drivers should have this: > > ndo_open() calls phy_connect() or phy_attach() + phy_start() because > that allows you to properly manage the PHY's power state and the state > machine, the reciprocal is to have ndo_stop() call phy_disconnect() (and > just that) which properly waits for the PHY state machine to be fully > stopped. > > phy_stop() returns immediately but the PHY state machine only gets > stopped asynchronously at a later time, either with an interrupt or with > an explicit work queue scheduling. If you call phy_disconnect() right > after, this cancels the work queue and it may not have run the > adjust_link callback yet. > >> >> >>> So from there can you check a few things: >>> >>> - is such an interrupt actually generated? >>> - if you turn on dynamic debug prints for drivers/net/phy/phy.c where do >>> we leave the PHY state machine and what state is it in when you call >>> ifconfig up again? >> >> The only interrupts I've ever seen the PHY generate are >> on plugging/unplugging the Ethernet cable. >> >> Looking at the driver and datasheet... >> http://elixir.free-electrons.com/linux/v4.13-rc2/source/drivers/net/phy/at803x.c#L312 >> value |= AT803X_INTR_ENABLE_AUTONEG_ERR; >> value |= AT803X_INTR_ENABLE_SPEED_CHANGED; >> value |= AT803X_INTR_ENABLE_DUPLEX_CHANGED; >> value |= AT803X_INTR_ENABLE_LINK_FAIL; >> value |= AT803X_INTR_ENABLE_LINK_SUCCESS; >> >> And the interrupts reasons supported by the PHY are: >> #define AT803X_INTR_ENABLE_AUTONEG_ERR BIT(15) >> #define AT803X_INTR_ENABLE_SPEED_CHANGED BIT(14) >> #define AT803X_INTR_ENABLE_DUPLEX_CHANGED BIT(13) >> #define AT803X_INTR_ENABLE_PAGE_RECEIVED BIT(12) >> #define AT803X_INTR_ENABLE_LINK_FAIL BIT(11) >> #define AT803X_INTR_ENABLE_LINK_SUCCESS BIT(10) >> #define AT803X_INTR_ENABLE_WIRESPEED_DOWNGRADE BIT(5) >> #define AT803X_INTR_ENABLE_POLARITY_CHANGED BIT(1) >> #define AT803X_INTR_ENABLE_WOL BIT(0) >> >> These all seem to be external reasons (from the peer). >> >> I did enable debug logs in drivers/net/phy/phy.c >> to trace the state machine, and it is not called >> at all on set link down, so it remains in state >> RUNNING (both in polling and interrupt modes).
Well now that I see the possible interrupts generated, I indeed don't see how you can get a link down notification unless you somehow force the link down yourself, which would certainly happen in phy_suspend() when we set BMCR.pwrdwn, but that may be too late. You should still expect the adjust_link() function to be called though with PHY_HALTED being set and that takes care of doing phydev->link = 0 and netif_carrier_off(). If that still does not work, then see whether removing the call to phy_stop() does help (it really should). -- Florian