Hi, all
I use a linux kernel is 3.4.34. And a lot of tests including many
network operation, such as MTU change, NIC up/down, and multi-Q creating
are running on this linux host. This linux host is vSphere, which
including 5 NIC, all of them are e1000 (Intel Corporation 82545EM
Gigabit Ethernet Controller (Cpooer) (rev 01) and number is 8086:100f).
The driver of e1000 is 7.3.21-k8-NAPI.
Before issue occur, there must be many reset adapter printing, such as:
/************************/
e1000 0000:02:01.0: eth1: Reset adapter
/************************/
When this problem happened, the following messages appeared.
/*****************************************************/
Jul 6 19:08:28 localhost kernel: e1000 0000:02:08.0: eth7:
e1000_reinit_safe set __E1000_RESETTING
Jul 6 19:08:28 localhost kernel: e1000 0000:02:08.0: eth7:
e1000_reinit_safe take adapter's mutex
Jul 6 19:08:28 localhost kernel: e1000 0000:02:08.0: eth7:
e1000_watchdog take adapter's mutex
Jul 6 19:08:28 localhost kernel: e1000 0000:02:03.0: eth3:
e1000_reinit_safe release adapter's mutex
Jul 6 19:08:28 localhost kernel: e1000 0000:02:03.0: eth3:
e1000_reinit_safe reset __E1000_RESETTING
/*****************************************************/
I analyzed the source code. There is a time slot between
__E1000_RESETTING and __E1000_DOWN.
When e1000_reinit_safe sets __E1000_RESETTING and takes adapter's mutex
before sets __E1000_DOWN, e1000_watchdog is scheduled and take adapter's
mutex, then e1000_reinit_safe shuts down nic while e1000_watchdog is
processing. Then e1000 nic will hang.
My solution is to prevent e1000_watchdog is scheduled in this time slot
between __E1000_RESETTING and __E1000_DOWN.
Is there anything wrong about this solution?
Best Regards!
zhuyj
On 08/15/2013 03:09 PM, zhuyj wrote:
Hi, maintainer
Would you like to comment on this patch?
Thanks a lot.
Best Regards!
Zhu Yanjun
On 08/15/2013 03:01 PM, zhuyj wrote:
Hi,
After a long time networking test case running, e1000 NIC driver may
not work anymore. At this time, system is okay, we can execute some
non-network command(such as ls, cp etc.), but if we execute network
command(ifconfig), system will hang there, can not get response anymore.
We add some log in driver and found this was caused by mutex nest, it
means normaly, one mutex got and then release, another mutex was got,
but when issue occur, from log, the first mutex was got, did not
release, then got mutex again:
/*****************************************************/
Jul 6 19:08:28 localhost kernel: e1000 0000:02:08.0: eth7:
e1000_reinit_safe set __E1000_RESETTING
Jul 6 19:08:28 localhost kernel: e1000 0000:02:08.0: eth7:
e1000_reinit_safe take adapter's mutex
Jul 6 19:08:28 localhost kernel: e1000 0000:02:08.0: eth7:
e1000_watchdog take adapter's mutex
Jul 6 19:08:28 localhost kernel: e1000 0000:02:03.0: eth3:
e1000_reinit_safe release adapter's mutex
Jul 6 19:08:28 localhost kernel: e1000 0000:02:03.0: eth3:
e1000_reinit_safe reset __E1000_RESETTING
/*****************************************************/
We made the following patch and applied this patch. This problem
disappeared.
Please comment on this patch.
Thanks a lot.
/***********************************************/
diff --git a/drivers/net/ethernet/intel/e1000/e1000_main.c
b/drivers/net/ethernet/intel/e1000/e1000_main.c
index 7569ebb..2878308 100644
--- a/drivers/net/ethernet/intel/e1000/e1000_main.c
+++ b/drivers/net/ethernet/intel/e1000/e1000_main.c
@@ -2441,7 +2441,8 @@ static void e1000_watchdog(struct work_struct
*work)
struct e1000_tx_ring *txdr = adapter->tx_ring;
u32 link, tctl;
- if (test_bit(__E1000_DOWN, &adapter->flags))
+ if (test_bit(__E1000_DOWN, &adapter->flags) ||
+ test_bit(__E1000_RESETTING,
&adapter->flags))
return;
/***********************************************/
zhuyj
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel® Ethernet, visit
http://communities.intel.com/community/wired