I added the dump_stack() to see the full flow as the error print "PCIe link
lost, device now detached" is there as part of igb_rd32() API which is
called at many places. Are we not supposed to 'cancel delayed work of
igb_ptp_overflow_check() when system goes to suspend state (and schedule
when system resumes)?

On Wed, May 11, 2016 at 9:54 PM, Keller, Jacob E <jacob.e.kel...@intel.com>
wrote:

> Hi,
>
> > -----Original Message-----
> > From: vidya sagar [mailto:sagar...@gmail.com]
> > Sent: Wednesday, May 11, 2016 3:25 AM
> > To: e1000-devel@lists.sourceforge.net; linuxptp-
> > us...@lists.sourceforge.net
> > Subject: Re: [E1000-devel] Need help with igb driver suspend crash issue
> >
> > <<< Including linuxptp-us...@lists.sourceforge.net >>>
> >
> > On Wed, May 11, 2016 at 3:51 PM, vidya sagar <sagar...@gmail.com>
> > wrote:
> >
> > > Hi,
> > > I'm using Intel IGB I350 NIC card on one of our arm based platforms.
> > > While suspending the system, sometimes we see "igb 0000:01:00.0 eth1:
> > PCIe
> > > link lost, device now detached" print in the log and subsequent resume
> > > causes system to crash. After digging the code (BTW, I'm using
> kernel-3.18
> > > release), it looks like the above print comes because of the following
> call
> > > flow, which got executed after igb_suspend() is called ( I confirmed
> this
> > > with the help of prints)
> > >
> > > [10846.434381] [<ffffffc000089ce4>] dump_backtrace+0x0/0xf8
> > > [10846.434386] [<ffffffc000089ea0>] show_stack+0x10/0x1c
> > > [10846.434393] [<ffffffc000bc3b70>] dump_stack+0x80/0xc4
> > > [10846.434397] [<ffffffc000613d3c>] igb_rd32+0xb0/0x1a8
> > > [10846.434400] [<ffffffc00062eb0c>] igb_ptp_read_82580+0x18/0x48
> > > [10846.434407] [<ffffffc000106e6c>] timecounter_read+0x1c/0x60
> > > [10846.434410] [<ffffffc00062f338>] igb_ptp_gettime_82576+0x2c/0x88
> > > [10846.434413] [<ffffffc00062f41c>] igb_ptp_overflow_check+0x1c/0x58
> > > [10846.434419] [<ffffffc0000ba584>] process_one_work+0x154/0x414
> > > [10846.434424] [<ffffffc0000bb338>] worker_thread+0x13c/0x4e4
> > > [10846.434428] [<ffffffc0000bfc4c>] kthread+0xf8/0x110
> > >
> > > It looks like reading timer registers would have returned all F's as
> the
> > > device is already in D3Hot state.
> > > Is my understanding correct. Is there any patch available to fix this
> > > issue?
> > > Let me know if more information is needed.
> > >
>
> Maybe an ordering bug when doing suspend that we try to read things too
> late. Is that stack trace the actual crash or did you add the dump_stack
> yourself?
>
> Thanks,
> Jake
>
> > > Thanks,
> > > Vidya Sagar
> > >
>
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to