Re: [Intel-wired-lan] [BUG] ixgbe: Detected Tx Unit Hang (XDP)

Michal Kubiak Thu, 10 Apr 2025 09:58:36 -0700

On Wed, Apr 09, 2025 at 05:17:49PM +0200, Marcus Wichelmann wrote:
> Hi,
> 
> in a setup where I use native XDP to redirect packets to a bonding interface
> that's backed by two ixgbe slaves, I noticed that the ixgbe driver constantly
> resets the NIC with the following kernel output:
> 
>   ixgbe 0000:01:00.1 ixgbe-x520-2: Detected Tx Unit Hang (XDP)
>     Tx Queue             <4>
>     TDH, TDT             <17e>, <17e>
>     next_to_use          <181>
>     next_to_clean        <17e>
>   tx_buffer_info[next_to_clean]
>     time_stamp           <0>
>     jiffies              <10025c380>
>   ixgbe 0000:01:00.1 ixgbe-x520-2: tx hang 19 detected on queue 4, resetting 
> adapter
>   ixgbe 0000:01:00.1 ixgbe-x520-2: initiating reset due to tx timeout
>   ixgbe 0000:01:00.1 ixgbe-x520-2: Reset adapter
> 
> This only occurs in combination with a bonding interface and XDP, so I don't
> know if this is an issue with ixgbe or the bonding driver.
> I first discovered this with Linux 6.8.0-57, but kernel 6.14.0 and 6.15.0-rc1
> show the same issue.
> 
> 
> I managed to reproduce this bug in a lab environment. Here are some details
> about my setup and the steps to reproduce the bug:
> 
> NIC: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
> 
> CPU: Ampere(R) Altra(R) Processor Q80-30 CPU @ 3.0GHz
>      Also reproduced on:
>      - Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz
>      - Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
> 
> Kernel: 6.15.0-rc1 (built from mainline)
> 
>   # ethtool -i ixgbe-x520-1
>   driver: ixgbe
>   version: 6.15.0-rc1
>   firmware-version: 0x00012b2c, 1.3429.0
>   expansion-rom-version: 
>   bus-info: 0000:01:00.0
>   supports-statistics: yes
>   supports-test: yes
>   supports-eeprom-access: yes
>   supports-register-dump: yes
>   supports-priv-flags: yes
> 
> The two ports of the NIC (named "ixgbe-x520-1" and "ixgbe-x520-2") are 
> directly
> connected with each other using a DAC cable. Both ports are configured to be
> slaves of a bonding with mode balance-rr.
> Neither the direct connection of  both ports nor the round-robin bonding mode
> are a requirement to reproduce the issue. This setup just allows it to be 
> easier
> reproduced in an isolated environment. The issue is also visible with a 
> regular
> 802.3ad link aggregation with a switch on the other side.
> 
>   # modprobe bonding
>   # ip link set dev ixgbe-x520-1 down
>   # ip link set dev ixgbe-x520-2 down
>   # ip link add bond0 type bond mode balance-rr
>   # ip link set dev ixgbe-x520-1 master bond0
>   # ip link set dev ixgbe-x520-2 master bond0
>   # ip link set dev ixgbe-x520-1 up
>   # ip link set dev ixgbe-x520-2 up
>   # ip link set dev bond0 up
>         
>   # cat /proc/net/bonding/bond0
>   Ethernet Channel Bonding Driver: v6.15.0-rc1
> 
>   Bonding Mode: load balancing (round-robin)
>   MII Status: up
>   MII Polling Interval (ms): 0
>   Up Delay (ms): 0
>   Down Delay (ms): 0
>   Peer Notification Delay (ms): 0
> 
>   Slave Interface: ixgbe-x520-1
>   MII Status: up
>   Speed: 10000 Mbps
>   Duplex: full
>   Link Failure Count: 0
>   Permanent HW addr: 6c:b3:11:08:5c:3c
>   Slave queue ID: 0
> 
>   Slave Interface: ixgbe-x520-2
>   MII Status: up
>   Speed: 10000 Mbps
>   Duplex: full
>   Link Failure Count: 0
>   Permanent HW addr: 6c:b3:11:08:5c:3e
>   Slave queue ID: 0
> 
>   # ethtool -l ixgbe-x520-1
>   Channel parameters for ixgbe-x520-1:
>   Pre-set maximums:
>   RX:             n/a
>   TX:             n/a
>   Other:          1
>   Combined:       63
>   Current hardware settings:
>   RX:             n/a
>   TX:             n/a
>   Other:          1
>   Combined:       63
>   (same for ixgbe-x520-2)
> 
> In the following the xdp-tools from https://github.com/xdp-project/xdp-tools/
> are used.
> 
> Enable XDP on the bonding and make sure all received packets will be dropped:
>   # xdp-tools/xdp-bench/xdp-bench drop -e -i 1 bond0
> 
> Redirect a batch of packets to the bonding interface:
>   # xdp-tools/xdp-trafficgen/xdp-trafficgen udp --dst-mac <mac of bond0>
>     --src-port 5000 --dst-port 6000 --threads 16 --num-packets 1000000 bond0
> 
> Shortly after that (3-4 seconds), one or more "Detected Tx Unit Hang" errors
> (see above) will show up in the kernel log.
> 
> The high number of packets and thread count (--threads 16) is not required to
> trigger the issue but greatly improves the probability.
> 
> 
> Do you have any ideas what may be causing this issue or what I can do to
> diagnose this further?
> 
> Please let me know when I should provide any more information.
> 
> 
> Thanks!
> Marcus
>


Hi Marcus,

Thank you for reporting this issue!
I have just successfully reproduced the problem on our lab machine. What
is interesting is that I do not seem to have to use a bonding interface
to get the "Tx timeout" that causes the adapter to reset.

I will try to debug the problem more closely and let you know of any
updates.

Thanks,
Michal

Re: [Intel-wired-lan] [BUG] ixgbe: Detected Tx Unit Hang (XDP)

Reply via email to