On Wed, Aug 06, 2025 at 07:05:19PM +0200, Paul Menzel wrote: > Dear Maciej, > > > Thank you for your patch. > > > Am 06.08.25 um 18:58 schrieb Maciej Fijalkowski: > > Currently ixgbe driver checks periodically in its watchdog subtask if > > there is anything to be transmitted (considering both Tx and XDP rings) > > under state of carrier not being 'ok'. Such event is interpreted as Tx > > hang and therefore results in interface reset. > > For people grepping through commit messages, add some more details how the > hang manifests?
Hi Paul, I didn't want to repeat too much of things here that are included under the link where original report took place (see lore link from Closes: tag). I know this adds a level of indirection for future reader, but I assumed lore link will not be gone and it is safe to rely on it. > > > This is currently problematic for ndo_xdp_xmit() as it is allowed to > > produce descriptors when interface is going through reset or its carrier > > is turned off. > > > > Furthermore, XDP rings should not really be objects of Tx hang > > detection. This mechanism is rather a matter of ndo_tx_timeout() being > > called from dev_watchdog against Tx rings exposed to networking stack. > > > > Taking into account issues described above, let us have a two fold fix - > > do not respect XDP rings in local ixgbe watchdog and do not produce Tx > > descriptors in ndo_xdp_xmit callback when there is some problem with > > carrier currently. For now, keep the Tx hang checks in clean Tx irq > > routine, but adjust it to not execute for XDP rings. > > Do you have a reproducer for this? Again, the original report has it. xdp-trafficgen was used to trigger this problem. I am not sure if it's worth re-spinning, especially that Marcus and Tobias might be angry at me that it still didn't make it to mainline:P > > > Cc: Tobias Böhm <[email protected]> > > Reported-by: Marcus Wichelmann <[email protected]> > > Closes: > > https://lore.kernel.org/netdev/[email protected]/ > > Fixes: 6453073987ba ("ixgbe: add initial support for xdp redirect") > > Fixes: 33fdc82f0883 ("ixgbe: add support for XDP_TX action") > > Reviewed-by: Aleksandr Loktionov <[email protected]> > > Tested-by: Marcus Wichelmann <[email protected]> > > Signed-off-by: Maciej Fijalkowski <[email protected]> > > --- > > v1->v2: > > * collect tags > > * fix typos (Dawid) > > --- > > drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 34 ++++++------------- > > 1 file changed, 11 insertions(+), 23 deletions(-) > > > > diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > > b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > > index 03d31e5b131d..7c0db3b3ee8e 100644 > > --- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > > +++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c > > @@ -967,10 +967,6 @@ static void ixgbe_update_xoff_rx_lfc(struct > > ixgbe_adapter *adapter) > > for (i = 0; i < adapter->num_tx_queues; i++) > > clear_bit(__IXGBE_HANG_CHECK_ARMED, > > &adapter->tx_ring[i]->state); > > - > > - for (i = 0; i < adapter->num_xdp_queues; i++) > > - clear_bit(__IXGBE_HANG_CHECK_ARMED, > > - &adapter->xdp_ring[i]->state); > > } > > static void ixgbe_update_xoff_received(struct ixgbe_adapter *adapter) > > @@ -1264,10 +1260,13 @@ static bool ixgbe_clean_tx_irq(struct > > ixgbe_q_vector *q_vector, > > total_bytes); > > adapter->tx_ipsec += total_ipsec; > > + if (ring_is_xdp(tx_ring)) > > + return !!budget; > > + > > if (check_for_tx_hang(tx_ring) && ixgbe_check_tx_hang(tx_ring)) { > > /* schedule immediate reset if we believe we hung */ > > struct ixgbe_hw *hw = &adapter->hw; > > - e_err(drv, "Detected Tx Unit Hang %s\n" > > + e_err(drv, "Detected Tx Unit Hang\n" > > " Tx Queue <%d>\n" > > " TDH, TDT <%x>, <%x>\n" > > " next_to_use <%x>\n" > > @@ -1275,16 +1274,14 @@ static bool ixgbe_clean_tx_irq(struct > > ixgbe_q_vector *q_vector, > > "tx_buffer_info[next_to_clean]\n" > > " time_stamp <%lx>\n" > > " jiffies <%lx>\n", > > - ring_is_xdp(tx_ring) ? "(XDP)" : "", > > tx_ring->queue_index, > > IXGBE_READ_REG(hw, IXGBE_TDH(tx_ring->reg_idx)), > > IXGBE_READ_REG(hw, IXGBE_TDT(tx_ring->reg_idx)), > > tx_ring->next_to_use, i, > > tx_ring->tx_buffer_info[i].time_stamp, jiffies); > > - if (!ring_is_xdp(tx_ring)) > > - netif_stop_subqueue(tx_ring->netdev, > > - tx_ring->queue_index); > > + netif_stop_subqueue(tx_ring->netdev, > > + tx_ring->queue_index); > > e_info(probe, > > "tx hang %d detected on queue %d, resetting adapter\n", > > @@ -1297,9 +1294,6 @@ static bool ixgbe_clean_tx_irq(struct ixgbe_q_vector > > *q_vector, > > return true; > > } > > - if (ring_is_xdp(tx_ring)) > > - return !!budget; > > - > > #define TX_WAKE_THRESHOLD (DESC_NEEDED * 2) > > txq = netdev_get_tx_queue(tx_ring->netdev, tx_ring->queue_index); > > if (!__netif_txq_completed_wake(txq, total_packets, total_bytes, > > @@ -7796,12 +7790,9 @@ static void ixgbe_check_hang_subtask(struct > > ixgbe_adapter *adapter) > > return; > > /* Force detection of hung controller */ > > - if (netif_carrier_ok(adapter->netdev)) { > > + if (netif_carrier_ok(adapter->netdev)) > > for (i = 0; i < adapter->num_tx_queues; i++) > > set_check_for_tx_hang(adapter->tx_ring[i]); > > - for (i = 0; i < adapter->num_xdp_queues; i++) > > - set_check_for_tx_hang(adapter->xdp_ring[i]); > > - } > > if (!(adapter->flags & IXGBE_FLAG_MSIX_ENABLED)) { > > /* > > @@ -8016,13 +8007,6 @@ static bool ixgbe_ring_tx_pending(struct > > ixgbe_adapter *adapter) > > return true; > > } > > - for (i = 0; i < adapter->num_xdp_queues; i++) { > > - struct ixgbe_ring *ring = adapter->xdp_ring[i]; > > - > > - if (ring->next_to_use != ring->next_to_clean) > > - return true; > > - } > > - > > return false; > > } > > @@ -10825,6 +10809,10 @@ static int ixgbe_xdp_xmit(struct net_device *dev, > > int n, > > if (unlikely(test_bit(__IXGBE_DOWN, &adapter->state))) > > return -ENETDOWN; > > + if (!netif_carrier_ok(adapter->netdev) || > > + !netif_running(adapter->netdev)) > > + return -ENETDOWN; > > + > > I am no expert here, but should the commit be split into two? fixing producer on one commit and consumer on other means that first commit would still contain a broken driver, which would be not a real *fix*. you can think of ixgbe_xdp_xmit() as a producer of descriptors and ixgbe_clean_tx_irq() as consumer (in reality HW is the consumer, but i hope this analogy makes some sense to you). > > > if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) > > return -EINVAL; > > > Kind regards, > > Paul
