>-----Original Message----- >From: Nix [mailto:n...@esperi.org.uk] >Sent: Saturday, January 29, 2011 3:44 PM >To: e1000-devel@lists.sourceforge.net >Subject: [E1000-devel] 82754L spontaneous freeze networking woes continue in >2.6.37 > >Way back in November, in ><http://sourceforge.net/mailarchive/forum.php?thread_name=87k4kfq1at.fsf%40spind >le.srvr.nix&forum_name=e1000-devel>, >I reported a problem with the 82754 in one of my machines freezing up at >random. This problem continues in 2.6.37, and bisection has still failed >because the fault is so intermittent (averaging three days apart and >sometimes taking as long as a week to freeze up, with many registers suddenly >reset to 0xff: but sometimes it freezes in only half an hour). > >I moaned about it in an LWN thread as well: <http://lwn.net/Articles/416758/> >and hmh suggested I come here, but I decided to hold off until I knew a >bit more. Since then, I've been able to characterize it a bit. (All the >conclusions below are tentative: perhaps I was just lucky in some cases >and the fault happened not to kick in before I tried something else.) > >It happens with both the in-kernel and out-of-tree drivers in 2.6.36 and >above, but does not affect 2.6.35 with either driver. It is *not* >suppressed by turning off MSI-X, nor by turning off jumbo frames (both >of which are working in 2.6.35 anyway). It is apparently suppressed by >switching it out of gigabit mode, by turning off every machine attached >to the subnet on which it is transmitting (though this may simply be an >artefact caused by its not needing to send anything down the link when >that is done), and, oddly, by pingflooding the machine (with the packets >entering via the NIC that fails). (I've been pingflooding it for three >weeks now, and no halts have happened. I stopped for three hours and the >NIC locked up.) > >I wonder if this has something to do with PCI ASPM? The driver turns >ASPM off at least partially for this NIC, but if the NIC is being >flipped into some sort of low-power state when transmission ceases for a >while, then perhaps there is a low probability of it not coming out of >it again properly. That would explain the symptoms I see (but so would >many other things, I suppoe).
It sounds like a kernel issue based on your description, and I would not be surprised if this turns out to be related to ASPM L1. Have you verified whether or not ASPM L0s is actually turned off on the 82574 by checking the LnkCtl capability register in the output of 'lspci -vvv -d 8086:xxxx' (where xxxx is either 10d3 or 10f6 depending on which 82574 you have)? Have you tried booting with pcie_aspm=off kernel parameter? Bruce. ------------------------------------------------------------------------------ Special Offer-- Download ArcSight Logger for FREE (a $49 USD value)! Finally, a world-class log management solution at an even better price-free! Download using promo code Free_Logger_4_Dev2Dev. Offer expires February 28th, so secure your free ArcSight Logger TODAY! http://p.sf.net/sfu/arcsight-sfd2d _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired