From: vidya sagar [mailto:sagar...@gmail.com]
Sent: Wednesday, May 11, 2016 10:44 AM
To: Keller, Jacob E <jacob.e.kel...@intel.com>
Cc: e1000-devel@lists.sourceforge.net; linuxptp-us...@lists.sourceforge.net
Subject: Re: [E1000-devel] Need help with igb driver suspend crash issue
I added the dump_stack() to see the full flow as the error print "PCIe link
lost, device now detached" is there as part of igb_rd32() API which is called
at many places. Are we not supposed to 'cancel delayed work of
igb_ptp_overflow_check() when system goes to suspend state (and schedule when
system resumes)?
On Wed, May 11, 2016 at 9:54 PM, Keller, Jacob E
<jacob.e.kel...@intel.com<mailto:jacob.e.kel...@intel.com>> wrote:
Hi,
> -----Original Message-----
> From: vidya sagar [mailto:sagar...@gmail.com<mailto:sagar...@gmail.com>]
> Sent: Wednesday, May 11, 2016 3:25 AM
> To:
> e1000-devel@lists.sourceforge.net<mailto:e1000-devel@lists.sourceforge.net>;
> linuxptp-
> us...@lists.sourceforge.net<mailto:us...@lists.sourceforge.net>
> Subject: Re: [E1000-devel] Need help with igb driver suspend crash issue
>
> <<< Including
> linuxptp-us...@lists.sourceforge.net<mailto:linuxptp-us...@lists.sourceforge.net>
> >>>
>
> On Wed, May 11, 2016 at 3:51 PM, vidya sagar
> <sagar...@gmail.com<mailto:sagar...@gmail.com>>
> wrote:
>
> > Hi,
> > I'm using Intel IGB I350 NIC card on one of our arm based platforms.
> > While suspending the system, sometimes we see "igb 0000:01:00.0 eth1:
> PCIe
> > link lost, device now detached" print in the log and subsequent resume
> > causes system to crash. After digging the code (BTW, I'm using kernel-3.18
> > release), it looks like the above print comes because of the following call
> > flow, which got executed after igb_suspend() is called ( I confirmed this
> > with the help of prints)
> >
> > [10846.434381] [<ffffffc000089ce4>] dump_backtrace+0x0/0xf8
> > [10846.434386] [<ffffffc000089ea0>] show_stack+0x10/0x1c
> > [10846.434393] [<ffffffc000bc3b70>] dump_stack+0x80/0xc4
> > [10846.434397] [<ffffffc000613d3c>] igb_rd32+0xb0/0x1a8
> > [10846.434400] [<ffffffc00062eb0c>] igb_ptp_read_82580+0x18/0x48
> > [10846.434407] [<ffffffc000106e6c>] timecounter_read+0x1c/0x60
> > [10846.434410] [<ffffffc00062f338>] igb_ptp_gettime_82576+0x2c/0x88
> > [10846.434413] [<ffffffc00062f41c>] igb_ptp_overflow_check+0x1c/0x58
> > [10846.434419] [<ffffffc0000ba584>] process_one_work+0x154/0x414
> > [10846.434424] [<ffffffc0000bb338>] worker_thread+0x13c/0x4e4
> > [10846.434428] [<ffffffc0000bfc4c>] kthread+0xf8/0x110
> >
> > It looks like reading timer registers would have returned all F's as the
> > device is already in D3Hot state.
> > Is my understanding correct. Is there any patch available to fix this
> > issue?
> > Let me know if more information is needed.
> >
Maybe an ordering bug when doing suspend that we try to read things too late.
Is that stack trace the actual crash or did you add the dump_stack yourself?
Thanks,
Jake
> > Thanks,
> > Vidya Sagar
> >
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired