On Mon, Feb 05, 2018 at 10:48:55PM +0100, Heiner Kallweit wrote: > Am 04.02.2018 um 03:48 schrieb Florian Fainelli: > > > > > > On 02/03/2018 03:58 PM, Heiner Kallweit wrote: > >> Am 03.02.2018 um 21:17 schrieb Andrew Lunn: > >>> On Sat, Feb 03, 2018 at 05:41:54PM +0100, Heiner Kallweit wrote: > >>>> This commit forces callers of phy_resume() and phy_suspend() to hold > >>>> mutex phydev->lock. This was done for calls to phy_resume() and > >>>> phy_suspend() in phylib, however there are more callers in network > >>>> drivers. I'd assume that these other calls issue a warning now > >>>> because of the lock not being held. > >>>> So is there something I miss or would this have to be fixed? > >>> > >>> Hi Heiner > >>> > >>> This is a good point. > >>> > >>> Yes, it looks like some fixes are needed. But what exactly? > >>> > >>> The phy state machine will suspend and resume the phy is you call > >>> phy_stop() and phy_start() in the MAC suspend and resume functions. > >>> > >> AFAICS phy_stop() doesn't suspend the PHY. It just sets the state > >> to PHY_HALTED and (at least if we're not in polling mode) doesn't > >> call phy_suspend(). Maybe a call to phy_trigger_machine() is > >> needed like in phy_start() ? Then the state machine would call > >> phy_suspend(), provided the link is still up. > > > > Right, phy_stop() merely just moves the state machine to PHY_HALTED and > > this is actually a great source of problems which I tried to address here: > > > > https://www.mail-archive.com/netdev@vger.kernel.org/msg196061.html > > > > because phy_stop() is not a synchronous call, so when it returns the > > state machine might still be running (it can take up to a 1 HZ depending > > on when you called phy_stop()) and so if you took that as a > > synchronization point to e.g: turn of your Ethernet MAC/MDIO bus clocks, > > you will likely see problems. phy_stop_machine() would provide that > > synchronization point, but is not currently exported, despite being a > > global symbol. This patch series above is all well and good, except that > > Geert reported issues with suspend/resume interactions which I have not > > been able to track down. > > > > We should most definitively try to consolidate the different PHY > > suspend/resume within the Ethernet MAC suspend/resume implementation and > > document exactly what the appropriate behavior must be under the > > following circumstances: > > > > - when to call phy_stop() + phy_stop_machine() > > - when to call phy_suspend() (if the network interface does do not WoL) > > - when to call phy_resume() (if needed, actually, it usually is not) > > - when to call phy_start() > > > > I think phy_start() / phy_start_machine() / phy_start_interrupts() > belong together and we may call the latter two functions from phy_start(). > Same for stop. > > This would mean: > - Remove call to phy_start_interrupts() from phy_connect_direct() > - Call phy_start_machine() and phy_start_interrupts() from phy_start() > - mdio_bus_phy_suspend() calls phy_stop() > Same for stop, plus: phy_error() calls phy_stop(). > > In this setup a second call to phy_stop() wouldn't hurt because state > is PHY_HALTED already and phy_stop() is a no-op. > > A functional change would be that interrupts are disabled during system > suspend (except WoL because we don't suspend the PHY is this case). > > These are first thoughts and therefore it's fine if you totally disagree .. > I didn't test this yet, it's only a "Gedankenexperiment" so far. > > When talking about suspend/resume I think we talked about system suspend / > resume. However I think we need to consider also runtime pm. > If a link is down the network driver may decide to runtime-suspend the PHY > (power it down). In case of runtime pm I'd say we need to keep irq and > workqueue active to be able to react if a cable is plugged in and the PHY > wakes up automatically and establishes a link.
Maybe a better solution now would be to restore phy_resume()'s lock- taking behaviour, and provide a lockless __phy_resume() which can be used internally within phylib. This means drivers using phy_resume() would see no change. Maybe something like (untested): drivers/net/phy/phy.c | 2 +- drivers/net/phy/phy_device.c | 17 ++++++++++++++--- include/linux/phy.h | 1 + 3 files changed, 16 insertions(+), 4 deletions(-) diff --git a/drivers/net/phy/phy.c b/drivers/net/phy/phy.c index f3313a129531..4574d02dce93 100644 --- a/drivers/net/phy/phy.c +++ b/drivers/net/phy/phy.c @@ -819,7 +819,7 @@ void phy_start(struct phy_device *phydev) break; case PHY_HALTED: /* if phy was suspended, bring the physical link up again */ - phy_resume(phydev); + __phy_resume(phydev); /* make sure interrupts are re-enabled for the PHY */ if (phydev->irq != PHY_POLL) { diff --git a/drivers/net/phy/phy_device.c b/drivers/net/phy/phy_device.c index b13eed21c87d..ae0f9306bbdc 100644 --- a/drivers/net/phy/phy_device.c +++ b/drivers/net/phy/phy_device.c @@ -136,7 +136,7 @@ static int mdio_bus_phy_resume(struct device *dev) goto no_resume; mutex_lock(&phydev->lock); - ret = phy_resume(phydev); + ret = __phy_resume(phydev); mutex_unlock(&phydev->lock); if (ret < 0) return ret; @@ -1042,7 +1042,7 @@ int phy_attach_direct(struct net_device *dev, struct phy_device *phydev, goto error; mutex_lock(&phydev->lock); - phy_resume(phydev); + __phy_resume(phydev); mutex_unlock(&phydev->lock); phy_led_triggers_register(phydev); @@ -1172,7 +1172,7 @@ int phy_suspend(struct phy_device *phydev) } EXPORT_SYMBOL(phy_suspend); -int phy_resume(struct phy_device *phydev) +int __phy_resume(struct phy_device *phydev) { struct phy_driver *phydrv = to_phy_driver(phydev->mdio.dev.driver); int ret = 0; @@ -1189,6 +1189,17 @@ int phy_resume(struct phy_device *phydev) return ret; } + +int phy_resume(struct phy_device *phydev) +{ + int ret; + + mutex_lock(&phydev->lock); + ret = __phy-resume(phydev); + mutex_unlock(&phydev->lock); + + return ret; +} EXPORT_SYMBOL(phy_resume); int phy_loopback(struct phy_device *phydev, bool enable) diff --git a/include/linux/phy.h b/include/linux/phy.h index 5a0c3e53e7c2..8f82bd64f82d 100644 --- a/include/linux/phy.h +++ b/include/linux/phy.h @@ -923,6 +923,7 @@ static inline void phy_device_free(struct phy_device *phydev) { } void phy_device_remove(struct phy_device *phydev); int phy_init_hw(struct phy_device *phydev); int phy_suspend(struct phy_device *phydev); +int __phy_resume(struct phy_device *phydev); int phy_resume(struct phy_device *phydev); int phy_loopback(struct phy_device *phydev, bool enable); struct phy_device *phy_attach(struct net_device *dev, const char *bus_id, -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up