Hi Zoltan,

Sorry to hear that you are having problems.  Does this happen on the same 
system when _not_ using Xen?  This device has been in out for a very long time 
and we have not had reports come in like this.  So please let us know if 
running without Xen changes it's behavior.

Thanks.

Cheers,
John

> -----Original Message-----
> From: Zoltán Halassy [mailto:cf0...@gmail.com]
> Sent: Friday, March 20, 2015 2:29 AM
> To: e1000-devel@lists.sourceforge.net
> Subject: [E1000-devel] Detected Hardware Unit Hang
> 
> I manage a Fujitsu Primergy TX100 S2 server, which has this integrated NIC:
> 
> 00:19.0 Ethernet controller [0200]: Intel Corporation 82578DM Gigabit
> Network Connection [8086:10ef] (rev 05)
>         Subsystem: Fujitsu Technology Solutions Device [1734:11a6]
>         Kernel driver in use: e1000e
> 
> Using a kernel as dom0 3.17.7 over xen 4.5.0
> 
> If the NIC is connected to a 100Mb/s link or forced to negotiate 100Mb/s (via
> ethtool advertise 0x008) it works properly. Kerdel dmesg shows this:
> 
> [    8.868197] e1000e: Intel(R) PRO/1000 Network Driver - 3.1.0.2-NAPI
> [    8.868199] e1000e: Copyright(c) 1999 - 2014 Intel Corporation.
> [    8.868412] e1000e 0000:00:19.0: Interrupt Throttling Rate
> (ints/sec) set to dynamic conservative mode
> [    9.115322] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width
> x1) 00:19:99:a7:f8:51
> [    9.115325] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network
> Connection
> [    9.115359] e1000e 0000:00:19.0 eth0: MAC: 9, PHY: 9, PBA No: 313130-031
> [    9.131477] e1000e 0000:00:19.0 enp0s25: renamed from eth0
> [   29.669862] e1000e: enp0s25 NIC Link is Up 100 Mbps Full Duplex,
> Flow Control: Rx
> [   29.669972] e1000e 0000:00:19.0 enp0s25: 10/100 speed: disabling TSO
> 
> However if it's connected to a 1Gb/s device, these messages appear first:
> 
> [   82.149105] e1000e: Intel(R) PRO/1000 Network Driver - 3.1.0.2-NAPI
> [   82.149108] e1000e: Copyright(c) 1999 - 2014 Intel Corporation.
> [   82.149312] e1000e 0000:00:19.0: Interrupt Throttling Rate
> (ints/sec) set to dynamic conservative mode
> [   82.396102] e1000e 0000:00:19.0 eth0: (PCI Express:2.5GT/s:Width
> x1) 00:19:99:a7:f8:51
> [   82.396105] e1000e 0000:00:19.0 eth0: Intel(R) PRO/1000 Network
> Connection
> [   82.396139] e1000e 0000:00:19.0 eth0: MAC: 9, PHY: 9, PBA No: 313130-031
> [   82.414029] e1000e 0000:00:19.0 enp0s25: renamed from eth0
> [   93.410124] e1000e: enp0s25 NIC Link is Up 1000 Mbps Full Duplex,
> Flow Control: Rx
> 
> Now, if I download something with high throughput (inbound traffic, say,
> 120Mb/s), then it works properly too. However if I try to upload something
> with high throughput (outbound traffic, the receiving end is willing to accept
> around ~25Mb/s), the connection hangs for a few seconds and these
> messages appear in dmesg:
> 
> [  155.601937] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
>   TDH                  <86>
>   TDT                  <a0>
>   next_to_use          <a0>
>   next_to_clean        <86>
> buffer_info[next_to_clean]:
>   time_stamp           <fffdb4c4>
>   next_to_watch        <86>
>   jiffies              <fffdbcbc>
>   next_to_watch.status <0>
> MAC Status             <40080083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <7800>
> PHY Extended Status    <2000>
> PCI Status             <10>
> [  157.602036] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
>   TDH                  <86>
>   TDT                  <a0>
>   next_to_use          <a0>
>   next_to_clean        <86>
> buffer_info[next_to_clean]:
>   time_stamp           <fffdb4c4>
>   next_to_watch        <86>
>   jiffies              <fffdc48c>
>   next_to_watch.status <0>
> MAC Status             <40080083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <7800>
> PHY Extended Status    <2000>
> PCI Status             <10>
> [  159.601880] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
>   TDH                  <86>
>   TDT                  <a0>
>   next_to_use          <a0>
>   next_to_clean        <86>
> buffer_info[next_to_clean]:
>   time_stamp           <fffdb4c4>
>   next_to_watch        <86>
>   jiffies              <fffdcc5c>
>   next_to_watch.status <0>
> MAC Status             <40080083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <7800>
> PHY Extended Status    <2000>
> PCI Status             <10>
> [  161.601989] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
>   TDH                  <86>
>   TDT                  <a0>
>   next_to_use          <a0>
>   next_to_clean        <86>
> buffer_info[next_to_clean]:
>   time_stamp           <fffdb4c4>
>   next_to_watch        <86>
>   jiffies              <fffdd42c>
>   next_to_watch.status <0>
> MAC Status             <40080083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <7800>
> PHY Extended Status    <2000>
> PCI Status             <10>
> [  163.602096] e1000e 0000:00:19.0 enp0s25: Detected Hardware Unit Hang:
>   TDH                  <86>
>   TDT                  <a0>
>   next_to_use          <a0>
>   next_to_clean        <86>
> buffer_info[next_to_clean]:
>   time_stamp           <fffdb4c4>
>   next_to_watch        <86>
>   jiffies              <fffddbfc>
>   next_to_watch.status <0>
> MAC Status             <40080083>
> PHY Status             <796d>
> PHY 1000BASE-T Status  <7800>
> PHY Extended Status    <2000>
> PCI Status             <10>
> [  163.605665] ------------[ cut here ]------------ [  163.605677] WARNING: 
> CPU:
> 0 PID: 0 at net/sched/sch_generic.c:264
> dev_watchdog+0x22c/0x240()
> [  163.605680] NETDEV WATCHDOG: enp0s25 (e1000e): transmit queue 0
> timed out [  163.605682] Modules linked in: e1000e(O)
> [  163.605690] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O
> 3.17.7-hardened-r1-uther #3
> [  163.605693] Hardware name: FUJITSU
> PRIMERGY TX100 S2             /D2779, BIOS 6.00 Rev. 1.07.2779.A1
>      04/29/2011
> [  163.605695]  0000000000000009 ffffffff8183f2a9 ffff88016f203e60
> ffffffff8106bb2d [  163.605699]  0000000000000000 ffff88016f203eb0
> 0000000000000001
> 0000000000000000
> [  163.605703]  0000000000000000 ffffffff8106bb97 ffffffff81ac7c18
> 0000000000000030
> [  163.605707] Call Trace:
> [  163.605710]  <IRQ>  [<ffffffff8183f2a9>] ? dump_stack+0x41/0x51 [
> 163.605728]  [<ffffffff8106bb2d>] ? warn_slowpath_common+0x6d/0x90 [
> 163.605730]  [<ffffffff8106bb97>] ? warn_slowpath_fmt+0x47/0x50 [
> 163.605733]  [<ffffffff81505ed2>] ? add_interrupt_randomness+0x32/0x1e0
> [  163.605735]  [<ffffffff816d120c>] ? dev_watchdog+0x22c/0x240 [
> 163.605737]  [<ffffffff816d0fe0>] ? dev_graft_qdisc+0x70/0x70 [  163.605741]
> [<ffffffff810ac232>] ? call_timer_fn.isra.36+0x12/0x70 [  163.605744]
> [<ffffffff810ac440>] ? run_timer_softirq+0x1b0/0x240 [  163.605746]
> [<ffffffff8106eb3b>] ? __do_softirq+0xdb/0x200 [  163.605748]
> [<ffffffff8106ee3d>] ? irq_exit+0x4d/0x60 [  163.605752]  [<ffffffff814d5cef>]
> ? xen_evtchn_do_upcall+0x2f/0x40 [  163.605755]  [<ffffffff818478fe>] ?
> xen_do_hypervisor_callback+0x1e/0x30
> [  163.605756]  <EOI>  [<ffffffff810013aa>] ?
> xen_hypercall_sched_op+0xa/0x20 [  163.605761]  [<ffffffff810013aa>] ?
> xen_hypercall_sched_op+0xa/0x20 [  163.605764]  [<ffffffff8100764c>] ?
> xen_safe_halt+0xc/0x20 [  163.605767]  [<ffffffff81014775>] ?
> default_idle+0x5/0x10 [  163.605770]  [<ffffffff810989e7>] ?
> cpu_startup_entry+0x217/0x260 [  163.605772]  [<ffffffff81bdcea5>] ?
> 0xffffffff81bdcea5 [  163.605774]  [<ffffffff81bdc8c8>] ? 0xffffffff81bdc8c8 [
> 163.605775]  [<ffffffff81be011e>] ? 0xffffffff81be011e [  163.605777] ---[ end
> trace 7d81642d805c09bf ]--- [  163.605785] e1000e 0000:00:19.0 enp0s25:
> Reset adapter unexpectedly [  166.432861] e1000e: enp0s25 NIC Link is Up
> 1000 Mbps Full Duplex, Flow Control: Rx
> 
> This happens with the vanilla bundled e1000e driver from 3.17.7 and with the
> 3.1.0.2 driver downloaded from intel.com too both with and without
> CFLAGS_EXTRA=-DDISABLE_PCI_MSI. The same problem was with 3.10 and
> 3.8 kernels. I don't know if there was a functional driver becasue the problem
> appeared only when we upgraded our switch to a 1Gb/s one, and at that
> time we already had the 3.9 kernel. I was hoping newer releases will fix this
> eventually as I found some similar problems on the net. But no luck yet.
> 
> If I let the NIC negotiate 1Gb/s, tried with "ethtool gso off gro off tso 
> off".
> The kernel log remains silent, but other problems appear:
> reaching the server with ssh over this NIC from the outside show a lot of
> latency (up to 5000ms), but when I ping the server periodically (1s 
> intervals),
> the server response gets better (~1000ms). When the server tries to
> download something, this problem does not appear (download speed
> reaches 120Mb/s from the Internet, the cap of the ISP).
> 
> Should I attach my kernel config? Nothing fancy there. MSI support is
> compiled into the kernel. Only the E1000E driver is enabled as module, the
> other Intel modules are disabled.
> 
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> E1000-devel mailing list
> E1000-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/e1000-devel
> To learn more about Intel&#174; Ethernet, visit
> http://communities.intel.com/community/wired

------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the 
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit 
http://communities.intel.com/community/wired

Reply via email to