On Thu, 15 Sep 2011, Lucas Nussbaum wrote:

> Hi,
> 
> On 15/09/11 at 22:56 +0000, Ronciak, John wrote:
> > What I think is happening is that after it come out of suspend the device 
> > memory mapping is either no longer mapped or it's somehow changed.  The 
> > reason I say this is:
> > 
> > [ 1569.219047] e1000e 0000:00:19.0: eth0: Error reading PHY register
> > [ 1569.437327] mount.crypt[8294]: segfault at 0 ip 00007fa19feb9a0e sp 
> > 00007fffde4e7bd0 error 4 in mount.crypt[7fa19feb2000+a000]
> > [ 1570.014032] e1000e 0000:00:19.0: eth0: Error reading PHY register
> > [ 1570.809335] e1000e 0000:00:19.0: eth0: Error reading PHY register

this is because the device is in D3, probably.

> > There should be no errors from the driver when reading the PHY.  This is 
> > also probably why ethtool does not work after the suspend.
> > BTW, does ethtool work before the first time the systems been suspended? 
> > Please try this if you haven't.
> > Anyway, I think suspend or resume is doing something bad to the system.
> 
> > Can you please try something like the latest 3.0 stable kernel and see if 
> > it's working there?
> > This is most likely not a driver issue but I guess it still could be.  
> > Please let me know if the 3.0 kernel works.
> 
> After a fresh boot on 3.0 (without doing any suspend/resume cycle):
> - network doesn't work
> - ethtool shows the same errors as before

runtime PM is doing this.

> dmesg attached.
> 
> # ifconfig eth0
> eth0      Link encap:Ethernet  HWaddr 00:24:e8:a7:2c:42  
>           inet6 addr: fe80::224:e8ff:fea7:2c42/64 Scope:Link
>           UP BROADCAST MULTICAST  MTU:1500  Metric:1
>           RX packets:3 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000 
>           RX bytes:486 (486.0 B)  TX bytes:3657 (3.5 KiB)
>           Interrupt:22 Memory:f6ae0000-f6b00000
> 
> # lspci -vvv
> 00:19.0 Ethernet controller: Intel Corporation 82567LM Gigabit Network 
> Connection (rev 03)
>         Subsystem: Dell Device 024d
>         Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- 
> Stepping- SERR+ FastB2B- DisINTx-
>         Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- 
> <TAbort- <MAbort- >SERR- <PERR- INTx-
>         Interrupt: pin A routed to IRQ 22
>         Region 0: Memory at f6ae0000 (32-bit, non-prefetchable) [disabled] 
> [size=128K]
>         Region 1: Memory at f6adb000 (32-bit, non-prefetchable) [disabled] 
> [size=4K]
>         Region 2: I/O ports at efe0 [disabled] [size=32]
>         Capabilities: [c8] Power Management version 2
>                 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA 
> PME(D0+,D1-,D2-,D3hot+,D3cold+)
=>                 Status: D3 NoSoftRst- PME-Enable+ DSel=0 DScale=1 PME+

look, PME+!  that means the device is actually asserting wake signal but 
the APIC/cpu is either not enabled or not listening.

>         Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
>                 Address: 00000000fee0300c  Data: 41b9
>         Capabilities: [e0] PCI Advanced Features
>                 AFCap: TP+ FLR+
>                 AFCtrl: FLR-
>                 AFStatus: TP-
>         Kernel driver in use: e1000e
> 
> By "network doesn't work", I mean:
> - network manager shows the Ethernet network as disconnected
> - when manually starting DHCP, DHCPDISCOVERs are sent, but then I get "No 
> DHCPOFFERS received.".
> - I can assign an IP address manually using ifconfig, but then pinging another
>   machine doesn't work.
> 
> 
> What really surprises me is the "Status: D3" in lspci. Shouldn't it be D0 if
> the interface is up?

This is the crux of the issue, the runtime PM feature of the kernel is 
attempting to enable "wake on link event" but for some reason if the cable 
is plugged in the adapter is not successfully asserting PME#, therefore 
the runtime power management doesn't re-enable the driver, and take the 
device to D0.

We also obviously have some bugs in our driver that we don't correctly 
manage the runtime power management state when ethtool queries and such 
are running.

My guess is we have a bug in the driver where we are not correctly arming 
the wake path in the kernel, or there is a kernel bug with your system 
where something is not getting hooked up.

we probably need to add a bunch of printk to the driver around the PME and 
wake setup code, to see what exactly is being programmed to the device 
wake registers.  We need to check the upstream bridges and the APIC 
enables (via sysfs) to make sure wake is allowed to propogate.

you might try these runtime pm kernel boot options to see if they make any 
difference

pcie_pme=       [PCIE,PM] Native PCIe PME signaling options:
                        Format: {auto|force}[,nomsi]
                auto    Use native PCIe PME signaling if the BIOS allows the
                        kernel to control PCIe config registers of root ports.
                force   Use native PCIe PME signaling even if the BIOS refuses
                        to allow the kernel to control the relevant PCIe config
                        registers.
                nomsi   Do not use MSI for native PCIe PME signaling (this makes
                        all PCIe root ports use INTx for everything).


------------------------------------------------------------------------------
BlackBerry&reg; DevCon Americas, Oct. 18-20, San Francisco, CA
http://p.sf.net/sfu/rim-devcon-copy2
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to