On 07/24/2017 03:53 PM, Mason wrote: > On 25/07/2017 00:36, Florian Fainelli wrote: >> On 07/24/2017 02:20 PM, Mason wrote: >>> On 24/07/2017 21:53, Florian Fainelli wrote: >>> >>>> Well now that I see the possible interrupts generated, I indeed don't >>>> see how you can get a link down notification unless you somehow force >>>> the link down yourself, which would certainly happen in phy_suspend() >>>> when we set BMCR.pwrdwn, but that may be too late. >>>> >>>> You should still expect the adjust_link() function to be called though >>>> with PHY_HALTED being set and that takes care of doing phydev->link = 0 >>>> and netif_carrier_off(). If that still does not work, then see whether >>>> removing the call to phy_stop() does help (it really should). >>> >>> The only functions setting phydev->state to PHY_HALTED >>> are phy_error() and phy_stop() AFAICT. >>> >>> I am aware that when phy_state_machine() handles the >>> PHY_HALTED state, it will set phydev->link = 0; >>> and call netif_carrier_off() -- because that's where >>> I copied that code from. >>> >>> My issue is that phy_state_machine() does not run when >>> I run 'ip set link dev eth0 down' from the command line. >> >> Yes, that much is clear, which is why I suggested earlier you try the >> patch at the end now. >> >>> >>> If I'm reading the code right, phy_disconnect() actually >>> stops the state machine. >>> >>> In interrupt mode, phy_state_machine() doesn't run >>> because no interrupt is generated. >>> >>> In polling mode, phy_state_machine() doesn't run >>> because phy_disconnect() stops the state machine. >>> >>> Introducing a sleep before phy_disconnect() gives >>> the state machine a chance to run in polling mode, >>> but it doesn't feel right, and doesn't fix the >>> other mode, which I'm using. >> >> There are several problems it seems: >> >> - the PHY state machine cannot move solely based on the PHY generating >> interrupts during phy_stop() if none of the changing conditions for the >> HW have changed, come to think about it, I doubt any PHY would be >> capable of doing something like that >> >> - there is an expectation from your driver to have adjust_link() run >> sometime during ndo_stop() to do something, but why? >> >> What is special about nb8800 that interrupts should be generated during >> ndo_stop(), and why do you think this is going to solve your problem? >> >>> >>> Looking at bcm_enet_stop() it calls phy_stop() and >>> phy_disconnect() just like the nb8800 driver... >>> >>> I'm stumped. >> >> My suggestion of not using phy_stop() was not a good one, but >> functionally there is little difference in doing phy_stop() + >> phy_disconnect() or just phy_disconnect() alone. What matters is that we >> restart the PHY properly when phy_connect() or phy_start() is called. >> >> What I understand now from your "bug report" is that you want >> adjust_link to run with phydev->link = 0 to do something during >> ndo_close() but you have not explained why, but I suspect such that upon >> ndo_open() things work again. >> >> What I suspect your bug is, is that the really was no change in link >> status, no interrupt was generated because there should not be one, yet >> some of what nb8800_stop() does is not properly balanced by >> nb8800_open(). How about the following patch: >> >> diff --git a/drivers/net/ethernet/aurora/nb8800.c >> b/drivers/net/ethernet/aurora/nb8800.c >> index 041cfb7952f8..b07dea3ab019 100644 >> --- a/drivers/net/ethernet/aurora/nb8800.c >> +++ b/drivers/net/ethernet/aurora/nb8800.c >> @@ -972,6 +972,10 @@ static int nb8800_open(struct net_device *dev) >> nb8800_mac_rx(dev, true); >> nb8800_mac_tx(dev, true); >> >> + priv->speed = -1; >> + priv->link = -1; >> + priv->duplex = -1; >> + >> phydev = of_phy_connect(dev, priv->phy_node, >> nb8800_link_reconfigure, 0, >> priv->phy_mode); >> > > I will test your two patches ASAP. > > For the record, I have two (maybe related) issues: > > 1) After a sequence of > ip set link dev eth0 up > ip set link dev eth0 down > ip set link dev eth0 up > RX engine is hosed, thus network connectivity is borked. > The work-around is resetting the MAC in ndo_open > => mac_init in my proposed patch. > This is by far the biggest issue. > Also, resetting the MAC in ndo_open will make it easy > to implement suspend/resume, as I can just ndo_stop > in suspend, and ndo_open in resume.
OK. > > 2) The system does not print "Link down" when I run > 'ip set link dev eth0 down' > This is a symptom of nb8800_link_reconfigure() > not being called at all (or being called > with phydev->link == priv->link when I tried > skipping phy_stop) Correct, and there is actually a timing hazard that I could also reproduce here in that phy_stop() + phy_disconnect(), will try to cook a patch fixing that as well, since that seems highly undesirable. > > Again, thanks for your patches and suggestions, > which I will test in the morning. > > I will also try to understand why I didn't have > these issues on the other board... The timing hazard would most certainly be more prominent on certain configurations that others. -- Florian