On 12/10/2013 02:42 PM, Morten Østergaard wrote:
> Hi
>
> We are experiencing a recurrent Linux kernel panic with the e1000e
> driver on a Kontron mSP1 COM Express "mini" CPU module and other similar
> COM Express modules.
>
> Unfortunately I am currently not able to save a kernel dump on the
> machines, but please see the attached jpeg image.
>
> The problem occurs frequently (right after booting and bringing the
> interface up) but not constantly. I suspect that we enter some kind of
> race condition.
>
> However it can easily be reproduced by bringing the interface up and
> down in a loop -and ping-flooding it from another host.
>
> on kontron:
>    while true; do ifdown eth0; ifup eth0; done
>
> on other Linux host:
>    ping -s1024 -w0 -f 192.168.4.1
>
> The panic will then happen every time after 1-240 seconds.
>
> I have been experimenting with various Linux kernels starting from 3.4
> to 3.10.20 and various tweaked configurations -but we are currently
> clinging on to 3.10.10 for other reasons. I have also experimented with
> various versions of the e1000e driver from kernel.org and the e1000(e)
> Sourceforge project. However the problem seems identical for all kernels
> and driver versions.
>
> I have attached kernel configuration, and various information from the
> machine, that might help identifying the issue. All output is from an
> unpatched 3.10.10 kernel from kernel.org and done on the same machine.
> The bom.txt file includes information about the (busybox based)
> userspace, all other output files should be easily identified by their
> name.
>
> The attached files are also uploaded to Dropbox here:
> https://www.dropbox.com/sh/bgvme32vuzpwg4v/qHaFUZ5Tu-
>
> Any help or ideas for debugging and problem solving will be greatly
> appreciated.

Well, looks like, NAPI may still happen when the interface is going 
down. During the execution of e1000_clean_rx_ring() called from 
e1000e_down(), NAPI seems to be enabled. It is disabled later, in 
e1000_close().

napi_synchronize() does not prevent new NAPI events from coming, by the way.

This might cause problems. The following commit changed the behaviour of 
the driver this way:

"e1000e: panic caused by Rx traffic arriving while interface going down"
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=a3b87a4c69619f5366b7225aafbf7983eed31a9a

How about disabling NAPI before cleanup of the rings, similar to what 
e1000 driver does? The patch is below. I haven't tested it yet though.

It would be great if someone with a deeper understanding of how e1000e 
joins this discussion. Perhaps there is a better way to ensure NAPI 
events cannot happen when the interface is going down?

Regards,
Eugene

----------------------
--- linux-3.10.23.old/drivers/net/ethernet/intel/e1000e/netdev.c 
2013-07-01 02:13:29.000000000 +0400
+++ linux-3.10.23.new/drivers/net/ethernet/intel/e1000e/netdev.c 
2013-12-16 18:29:09.645980966 +0400
@@ -4014,9 +4014,9 @@
        e1e_flush();
        usleep_range(10000, 20000);

-       e1000_irq_disable(adapter);
+       napi_disable(&adapter->napi);

-       napi_synchronize(&adapter->napi);
+       e1000_irq_disable(adapter);

        del_timer_sync(&adapter->watchdog_timer);
        del_timer_sync(&adapter->phy_info_timer);
@@ -4379,8 +4379,6 @@
                e1000_free_irq(adapter);
        }

-       napi_disable(&adapter->napi);
-
        e1000_power_down_phy(adapter);

        e1000e_free_tx_resources(adapter->tx_ring);
----------------------

-- 
Eugene Shatokhin, ROSA Laboratory.
www.rosalab.com

------------------------------------------------------------------------------
Rapidly troubleshoot problems before they affect your business. Most IT 
organizations don't have a clear picture of how application performance 
affects their revenue. With AppDynamics, you get 100% visibility into your 
Java,.NET, & PHP application. Start your 15-day FREE TRIAL of AppDynamics Pro!
http://pubads.g.doubleclick.net/gampad/clk?id=84349831&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to