On 12/10/2013 02:42 PM, Morten Østergaard wrote: > Hi > > We are experiencing a recurrent Linux kernel panic with the e1000e > driver on a Kontron mSP1 COM Express "mini" CPU module and other similar > COM Express modules. > > Unfortunately I am currently not able to save a kernel dump on the > machines, but please see the attached jpeg image. > > The problem occurs frequently (right after booting and bringing the > interface up) but not constantly. I suspect that we enter some kind of > race condition. > > However it can easily be reproduced by bringing the interface up and > down in a loop -and ping-flooding it from another host. > > on kontron: > while true; do ifdown eth0; ifup eth0; done > > on other Linux host: > ping -s1024 -w0 -f 192.168.4.1 > > The panic will then happen every time after 1-240 seconds. > > I have been experimenting with various Linux kernels starting from 3.4 > to 3.10.20 and various tweaked configurations -but we are currently > clinging on to 3.10.10 for other reasons. I have also experimented with > various versions of the e1000e driver from kernel.org and the e1000(e) > Sourceforge project. However the problem seems identical for all kernels > and driver versions. > > I have attached kernel configuration, and various information from the > machine, that might help identifying the issue. All output is from an > unpatched 3.10.10 kernel from kernel.org and done on the same machine. > The bom.txt file includes information about the (busybox based) > userspace, all other output files should be easily identified by their > name. > > The attached files are also uploaded to Dropbox here: > https://www.dropbox.com/sh/bgvme32vuzpwg4v/qHaFUZ5Tu- > > Any help or ideas for debugging and problem solving will be greatly > appreciated.
Well, looks like, NAPI may still happen when the interface is going down. During the execution of e1000_clean_rx_ring() called from e1000e_down(), NAPI seems to be enabled. It is disabled later, in e1000_close(). napi_synchronize() does not prevent new NAPI events from coming, by the way. This might cause problems. The following commit changed the behaviour of the driver this way: "e1000e: panic caused by Rx traffic arriving while interface going down" https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a3b87a4c69619f5366b7225aafbf7983eed31a9a How about disabling NAPI before cleanup of the rings, similar to what e1000 driver does? The patch is below. I haven't tested it yet though. It would be great if someone with a deeper understanding of how e1000e joins this discussion. Perhaps there is a better way to ensure NAPI events cannot happen when the interface is going down? Regards, Eugene ---------------------- --- linux-3.10.23.old/drivers/net/ethernet/intel/e1000e/netdev.c 2013-07-01 02:13:29.000000000 +0400 +++ linux-3.10.23.new/drivers/net/ethernet/intel/e1000e/netdev.c 2013-12-16 18:29:09.645980966 +0400 @@ -4014,9 +4014,9 @@ e1e_flush(); usleep_range(10000, 20000); - e1000_irq_disable(adapter); + napi_disable(&adapter->napi); - napi_synchronize(&adapter->napi); + e1000_irq_disable(adapter); del_timer_sync(&adapter->watchdog_timer); del_timer_sync(&adapter->phy_info_timer); @@ -4379,8 +4379,6 @@ e1000_free_irq(adapter); } - napi_disable(&adapter->napi); - e1000_power_down_phy(adapter); e1000e_free_tx_resources(adapter->tx_ring); ---------------------- -- Eugene Shatokhin, ROSA Laboratory. www.rosalab.com ------------------------------------------------------------------------------ Rapidly troubleshoot problems before they affect your business. Most IT organizations don't have a clear picture of how application performance affects their revenue. With AppDynamics, you get 100% visibility into your Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro! http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk _______________________________________________ E1000-devel mailing list E1000-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/e1000-devel To learn more about Intel® Ethernet, visit http://communities.intel.com/community/wired