[Click] e1000 transmit lockups?

Eddie Kohler Sat, 06 Mar 2010 16:25:12 -0800

Dear Click e1000 users,

Is anyone still seeing occasional errors at driver install time, where an e000 reports "e1000_clean_tx_irq: Detected Tx Unit Hang" and refuses to send packets?

Kevin Springborn sent the forwarded mail 2 years ago (THANKS KEVIN!!!), which mentioned this problem and contained a fix. However, it does not seem like Joonwoo's drivers contain the patch. I don't know whether the drivers themselves have evolved enough so as not to need the patch, or whether the bug is still happening. Please let me know.


Thanks for your patience,
Eddie

--- Begin Message ---

Hi,

I think I've discovered a bug in e1000_poll_on() in click's Intel
e1000-7.x driver. the bug is only experienced with polling enabled in the
linuxmodule. The bug should be present for uniprocessor machines, but is
more likely in multiprocessor machines.

Symptom:
About out of every 20 installs of the linuxmodule I would not see any
packets transmitted and would see a message similar to the following in
the log:
e1000: eth3: e1000_clean_tx_irq: Detected Tx Unit Hang
   Tx Queue             <0>
   TDH                  <0>
   TDT                  <2a>
   next_to_use          <2a>
   next_to_clean        <0>
 buffer_info[next_to_clean]
   time_stamp           <104941e6f>
   next_to_watch        <0>
   jiffies              <104960aec>
   next_to_watch.status <0>


Cause:
The problem is that transmitting on the adapter is never enabled if the
link state updates quickly.


Explanation:
The bug is seldom seen because of a sequence of recovery steps in the
watchdog which enabled transmitting.

In the case of auto recovery the link state (E1000_READ_REG(&adapter->hw,
STATUS) & E1000_STATUS_LU) remains down through the first run of the
watchdog (e1000_watchdog_1()). The watchdog detects an inconsistent state
where the link is down and netif_carrier is ok. The watchdog takes the
correct steps to resolve the inconsistency, including enabling
transmissions.

In the case where the tx_ring locks the link state comes back up before
the watchdog runs. In this case the link is up and netif_carrier is ok =>
nothing is inconsistent => no recovery code is run => transmission is not
enabled. Which results in no packets being transmitted and the tx ring
filling. Apply replication.patch to replicate the problem.


Solution:
The solution is to enable transmissions when polling is turned on. This
behaves correctly in both of the cases mentioned above: If the link status
is not updated to 'up' before the watchdog is run, then the register which
enables tx is set twice. If the link stats is 'down' on the first watchdog
run, then the unnecessary recovery code is not run.


The polling.patch implements the outlined solution and also adds some
debugging output to help identify lockups.



~Kevin Springborn

polling.patch
Description: Binary data

replication.patch
Description: Binary data

_______________________________________________
click mailing list
[email protected]
https://amsterdam.lcs.mit.edu/mailman/listinfo/click

--- End Message ---

_______________________________________________
click mailing list
[email protected]
https://amsterdam.lcs.mit.edu/mailman/listinfo/click

[Click] e1000 transmit lockups?

Reply via email to