Thank you for your contribution! Sashiko AI review found 2 potential issue(s) to consider:
Pre-existing issues: - [High] State Machine Violation / NULL Pointer Dereference on interface open after a failed MTU change. - [High] Uninitialized Resource Teardown / Invalid Hardware Request during TX queue creation failure. -- commit effecc83d343cb89cadf93984f0a62ac828d0166 Author: Dipayaan Roy <[email protected]> net: mana: Add NULL guards in teardown path to prevent panic on attach failure When queue allocation fails partway through, the error cleanup frees and NULLs apc->tx_qp and apc->rxqs. Multiple teardown paths such as mana_remove(), mana_change_mtu() recovery, and internal error handling in mana_alloc_queues() can subsequently call into functions that dereference these pointers without NULL checks. > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c > b/drivers/net/ethernet/microsoft/mana/mana_en.c > index 9afc786b297a8..9e7e4bf526bf6 100644 > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c > @@ -1727,6 +1727,9 @@ static void mana_fence_rqs(struct mana_port_context > *apc) > struct mana_rxq *rxq; > int err; > > + if (!apc->rxqs) > + return; > + [Severity: High] This isn't a bug introduced by this patch, but if mana_change_mtu() fails, does it leave the device in a state where a subsequent mana_open() will crash? When mana_change_mtu() calls mana_detach(ndev, false), the device is detached and mana_cleanup_port_context() is explicitly called, freeing apc->rxqs and setting it to NULL. It then calls mana_attach(). If mana_attach() fails internally (e.g., inside its call to mana_alloc_queues()), its error path calls mana_cleanup_port_context() again, which frees apc->rxqs and sets it to NULL. mana_change_mtu() then returns the error. The interface is now in the DOWN state (apc->port_is_up == false) with apc->rxqs == NULL. If the user subsequently attempts to bring the interface up (ip link set up), mana_open() is executed. mana_open() calls mana_alloc_queues() but does not call mana_init_port() to reallocate apc->rxqs. Inside mana_alloc_queues(), mana_add_rx_queues() unconditionally dereferences apc->rxqs: apc->rxqs[i] = rxq; Could this result in a kernel panic? [ ... ] > @@ -3287,33 +3294,38 @@ static int mana_dealloc_queues(struct net_device > *ndev) > * number of queues. > */ > > - for (i = 0; i < apc->num_queues; i++) { > - txq = &apc->tx_qp[i].txq; > - tsleep = 1000; > - while (atomic_read(&txq->pending_sends) > 0 && > - time_before(jiffies, timeout)) { > - usleep_range(tsleep, tsleep + 1000); > - tsleep <<= 1; > - } > - if (atomic_read(&txq->pending_sends)) { > - err = pcie_flr(to_pci_dev(gd->gdma_context->dev)); > - if (err) { > - netdev_err(ndev, "flr failed %d with %d pkts > pending in txq %u\n", > - err, > atomic_read(&txq->pending_sends), > - txq->gdma_txq_id); > + if (apc->tx_qp) { > + for (i = 0; i < apc->num_queues; i++) { > + txq = &apc->tx_qp[i].txq; > + tsleep = 1000; > + while (atomic_read(&txq->pending_sends) > 0 && > + time_before(jiffies, timeout)) { > + usleep_range(tsleep, tsleep + 1000); > + tsleep <<= 1; > + } [Severity: High] This is a pre-existing issue, but does mana_destroy_txq() send invalid handles to the hardware when TX queue creation fails? In mana_create_txq(), the apc->tx_qp array is allocated via kzalloc_objs(), meaning all elements (including tx_object) are initialized to 0. The function then loops over num_queues. For the current queue i, it sets apc->tx_qp[i].tx_object = INVALID_MANA_HANDLE; before attempting to create the hardware object via mana_create_wq_obj(). If creation fails, it jumps to the out label and calls mana_destroy_txq(apc). mana_destroy_txq() unconditionally loops over all num_queues and calls: mana_destroy_wq_obj(apc, GDMA_SQ, apc->tx_qp[i].tx_object); Unlike mana_destroy_rxq(), which checks if (rxq->rxobj != INVALID_MANA_HANDLE), mana_destroy_txq() passes the handle directly to mana_destroy_wq_obj(). For the failed queue, the handle is INVALID_MANA_HANDLE. For all unreached queues (j > i), the handle is 0. Could this cause mana_destroy_wq_obj() to send invalid handles (including 0, which might be a valid handle for an unrelated object) to the hardware? -- Sashiko AI review ยท https://sashiko.dev/#/patchset/[email protected]?part=1
