So  there is a very nasty bug in the e1000e network card
driver.

I am running Debian 12 Bookworm.

You will get the message "Detected Hardware Unit Hang" and then
the network card just stops working.

This is a built in NIC  on the computer
The computer is a is a HP Prodesk 600 G4 MT

This is the mini tower version as denoted by the MT.


This log comes from my /var/log/syslog.


Apr 15 01:57:12 gateway vmunix: [ 7743.893557] e1000e 0000:00:1f.6 eth1: Detected Hardware Unit Hang:
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] TDH                  <b7>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] TDT                  <ed>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] next_to_use          <ed>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] next_to_clean        <b7>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] buffer_info[next_to_clean]:
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] time_stamp           <1001c6345>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] next_to_watch        <b7>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] jiffies              <1001c6550>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] next_to_watch.status <0>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] MAC Status             <80083>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] PHY Status             <796d>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] PHY 1000BASE-T Status  <3800>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] PHY Extended Status    <3000>
Apr 15 01:57:12 gateway vmunix: [ 7743.893557] PCI Status             <10>
Apr 15 01:57:13 gateway vmunix: [ 7744.123237] net-fw DROP IN=eth0 OUT= MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219 DST=199.126.41.116 LE> Apr 15 01:57:13 gateway vmunix: [ 7744.417235] net-fw DROP IN=eth0 OUT= MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219 DST=199.126.41.116 LE> Apr 15 01:57:14 gateway vmunix: [ 7745.412183] net-fw DROP IN=eth0 OUT= MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219 DST=199.126.41.116 LE> Apr 15 01:57:14 gateway vmunix: [ 7745.659234] net-fw DROP IN=eth0 OUT= MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219 DST=199.126.41.116 LE> Apr 15 01:57:14 gateway vmunix: [ 7745.877564] e1000e 0000:00:1f.6 eth1: Detected Hardware Unit Hang:
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] TDH                  <b7>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] TDT                  <ed>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] next_to_use          <ed>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] next_to_clean        <b7>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] buffer_info[next_to_clean]:
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] time_stamp           <1001c6345>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] next_to_watch        <b7>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] jiffies              <1001c6740>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] next_to_watch.status <0>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] MAC Status             <80083>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] PHY Status             <796d>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] PHY 1000BASE-T Status  <3800>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] PHY Extended Status    <3000>
Apr 15 01:57:14 gateway vmunix: [ 7745.877564] PCI Status             <10>
Apr 15 01:57:15 gateway vmunix: [ 7746.220253] net-fw DROP IN=eth0 OUT= MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219 DST=199.126.41.116 LE> Apr 15 01:57:15 gateway vmunix: [ 7746.485268] net-fw DROP IN=eth0 OUT= MAC=00:13:3b:e3:8f:b0:0c:a4:02:35:6d:87:08:00 SRC=75.159.223.219 DST=199.126.41.116 LE> Apr 15 01:57:16 gateway vmunix: [ 7747.893578] e1000e 0000:00:1f.6 eth1: Detected Hardware Unit Hang:
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] TDH                  <b7>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] TDT                  <ed>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] next_to_use          <ed>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] next_to_clean        <b7>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] buffer_info[next_to_clean]:
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] time_stamp           <1001c6345>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] next_to_watch        <b7>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] jiffies              <1001c6938>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] next_to_watch.status <0>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] MAC Status             <80083>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] PHY Status             <796d>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] PHY 1000BASE-T Status  <3800>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] PHY Extended Status    <3000>
Apr 15 01:57:16 gateway vmunix: [ 7747.893578] PCI Status             <10>


It does this multiple times and the network interface in this case eth1 becomes unstable and just stops responding now I can't have that because this computer is being used as a gateway.  Usually what you have to do at that point is reboot the
machine.

uname -a
Linux gateway 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux

This is a gigabit network card as I said it is a built in NIC I believe it is an Intel NIC.

ethtool --show-eee eth1
EEE settings for eth1:
        EEE status: enabled - inactive
        Tx LPI: 17 (us)
        Supported EEE link modes:  100baseT/Full
                                   1000baseT/Full
        Advertised EEE link modes:  100baseT/Full
                                    1000baseT/Full
        Link partner advertised EEE link modes:  Not reported


ethtool -g eth1
Ring parameters for eth1:
Pre-set maximums:
RX:             4096
RX Mini:        n/a
RX Jumbo:       n/a
TX:             4096
Current hardware settings:
RX:             256
RX Mini:        n/a
RX Jumbo:       n/a
TX:             256
RX Buf Len:             n/a
CQE Size:               n/a
TX Push:        off
TCP data split: n/a


lspci
00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07) 00:02.0 VGA compatible controller: Intel Corporation CoffeeLake-S GT2 [UHD Graphics 630] 00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10) 00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:14.3 Network controller: Intel Corporation Cannon Lake PCH CNVi WiFi (rev 10) 00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10) 00:16.3 Serial controller: Intel Corporation Cannon Lake PCH Active Management Technology - SOL (rev 10) 00:17.0 RAID bus controller: Intel Corporation SATA Controller [RAID mode] (rev 10) 00:1c.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port #6 (rev f0) 00:1f.0 ISA bridge: Intel Corporation Q370 Chipset LPC/eSPI Controller (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller: Intel Corporation Cannon Lake PCH SPI Controller (rev 10) 00:1f.6 Ethernet controller: Intel Corporation Ethernet Connection (7) I219-LM (rev 10) 01:00.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 06)

lspci -vn

00:1f.6 0200: 8086:15bb (rev 10)
        DeviceName: Onboard Lan
        Subsystem: 103c:83ed
        Flags: bus master, fast devsel, latency 0, IRQ 123, IOMMU group 8
        Memory at f1180000 (32-bit, non-prefetchable) [size=128K]
        Capabilities: [c8] Power Management version 3
        Capabilities: [d0] MSI: Enable+ Count=1/1 Maskable- 64bit+
        Kernel driver in use: e1000e
        Kernel modules: e1000e

This seems to happen when you are actually pushing a bit of traffic
though it not a lot but just even a little bit.  It isn't network overload
or anything I am barely doing anything really but it will do this.


I have already tried  the following

ethtool -K eth1 tx off rx off
ethtool -K eth1 tso off gso off
ethtool -K eth1 gso off gro off tso off tx off rx off rxvlan off txvlan off sg off

I have disabled all power management in the bios as well including the one
for ASPM

I added the following to grub

pcie_aspm=off e1000e.SmartPowerDownEnable=0


This is in /etc/default/grub
GRUB_CMDLINE_LINUX_DEFAULT="quiet pcie_aspm=off e1000e.SmartPowerDownEnable=0"

Then I did an update-grub as well.

None of this has worked in fixing this problem.  I am still getting the same issue.


Can you please fix this issue this is a really nasty problem with Debian 12 (Bookworm)

I am seeing this being reported back in Kernel 5.3.x but i am not seeing any
reports for 6.1.x about this issue.

Debian Bug report logs - #945912
Kernel 5.3 e100e Detected Hardware Unit Hang
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=945912


To reproduce this I mean if you had the same type of computer with the same NIC in it.
and installed Debian 12 it will happen.

This should go to your kernel team I believe as this is an issue with the kernel driver
module for this NIC.


Any response should be done via email only on this bug please.

Please reply back and confirm that you got this email and that you are looking
into this problem please.


Thank you,

Jamie (she / her)

--
This email message, including any attachments, is for the intended recipient(s) 
only
and may contain information that is privileged, confidential and/or exempt from
disclosure under applicable law. If you have received this message in error, or 
are
obviously not one of the intended recipients, please immediately notify the 
sender
by reply email and delete this email message, including any attachments.
All information in this email including any attachment(s)
is to be kept in strict confidence and is not to be released
to anyone without my prior written consent.

Reply via email to