On Wed, 25 Feb 2026 11:50:40 -0800 Long Li <[email protected]> wrote:
> Add support for changing the number of RX/TX queues at runtime > via port stop/configure/start. When the queue count changes, > perform a full NVS/RNDIS teardown and reinit to allocate fresh > VMBus subchannels matching the new queue count, then reconfigure > RSS indirection table accordingly. > > Key changes: > - hn_dev_configure: detect queue count changes and perform full > NVS session reinit with subchannel teardown/recreation > - hn_dev_stop: drain pending TX completions (up to 1s) to prevent > stale completions from corrupting queue state after reconfig > - eth_hn_dev_uninit: close subchannels before primary channel to > prevent resource leaks on device removal > - hn_vf_tx/rx_queue_release: null VF queue pointers after release > to prevent use-after-free when the ethdev framework releases > excess queues during VF reconfiguration > > Signed-off-by: Long Li <[email protected]> > --- Sorry for AI review overload here... ## Patch Review: `[PATCH] net/netvsc: support runtime queue count reconfiguration` ### Errors **(E1) `hn_dev_configure`: `rte_vmbus_unmap_device` after closing subchannels and primary channel — subsequent `rte_vmbus_map_device` failure leaves device in unrecoverable state with no channel and no mapping** The reinit sequence is: ```c hn_detach(hv); rte_vmbus_chan_close(hv->channels[0]); rte_free(hv->channels[0]); hv->channels[0] = NULL; rte_vmbus_unmap_device(hv->vmbus); err = rte_vmbus_map_device(hv->vmbus); if (err) { PMD_DRV_LOG(ERR, "Could not re-map vmbus device!"); goto reinit_failed; } ``` If `rte_vmbus_map_device` fails, execution goes to `reinit_failed`, which attempts recovery by calling `rte_vmbus_chan_open`. But the device is unmapped at this point — the MMIO regions are gone. `rte_vmbus_chan_open` on an unmapped device is undefined behavior (it will try to access unmapped ring buffers). The recovery path cannot work without a successful map. The recovery block should check whether the device is mapped before attempting channel open, or re-attempt the map first. **(E2) `hn_dev_configure`: `rte_free(hv->channels[0])` frees memory not allocated with `rte_malloc`** After closing the primary channel: ```c rte_vmbus_chan_close(hv->channels[0]); rte_free(hv->channels[0]); hv->channels[0] = NULL; ``` And similarly in the `hn_attach` failure recovery: ```c rte_vmbus_chan_close(hv->channels[0]); rte_free(hv->channels[0]); hv->channels[0] = NULL; ``` `channels[0]` is allocated by `rte_vmbus_chan_open()`. Whether this is `rte_malloc`'d or regular `malloc`'d depends on the vmbus implementation. If `rte_vmbus_chan_open` uses `malloc` internally (which is common for control structures), then `rte_free` is a mismatch that may silently corrupt the heap. Need to verify the allocation source matches — or use the same free that `rte_vmbus_chan_close` would use. In `eth_hn_dev_uninit` (existing code), the primary channel is freed via `rte_free(hv->primary)`, not `rte_free(hv->channels[0])`, suggesting these may be different pointers or the allocation path uses `rte_malloc`. Worth verifying. **(E3) `hn_dev_configure`: RNDIS offload configuration applied before queue count change check, not re-applied after NVS teardown/reinit** The sequence is: ```c err = hn_rndis_conf_offload(hv, ...); /* Configure RNDIS offload */ ... if (queue count unchanged) goto skip_reinit; /* Close subchannels, hn_detach(), chan_close, chan_open, hn_attach() */ ``` `hn_detach()` tears down the RNDIS session. `hn_attach()` establishes a new NVS/RNDIS session. But `hn_rndis_conf_offload()` was called *before* the teardown and is not called again after `hn_attach()`. The offload configuration from the first call is lost when the RNDIS session is destroyed. The offload configuration should be re-applied after `hn_attach()`, or moved to after the reinit block. **(E4) `hn_vf_tx_queue_release` / `hn_vf_rx_queue_release`: setting queue pointer to NULL under read lock while Tx/Rx paths read the same pointer under read lock** ```c rte_rwlock_read_lock(&hv->vf_lock); vf_dev = hn_get_vf_dev(hv); if (vf_dev && vf_dev->dev_ops->tx_queue_release) { (*vf_dev->dev_ops->tx_queue_release)(vf_dev, queue_id); vf_dev->data->tx_queues[queue_id] = NULL; } rte_rwlock_read_unlock(&hv->vf_lock); ``` The fast path (`hn_xmit_pkts`) reads `vf_dev->data->tx_queues[queue_id]` under the same read lock: ```c void *sub_q = vf_dev->data->tx_queues[queue_id]; nb_tx = (*vf_dev->tx_pkt_burst)(sub_q, tx_pkts, nb_pkts); ``` Multiple readers can hold a rwlock simultaneously. This means a Tx thread could read the queue pointer, then the release thread NULLs it and frees the underlying memory, and the Tx thread calls into freed memory. The NULL assignment after release prevents *subsequent* accesses, but doesn't protect a concurrent reader that already loaded the pointer. This needs a write lock, not a read lock. ### Warnings **(W1) `hn_dev_configure`: `subchan_cleanup` error path sets `hv->num_queues = 1` but doesn't restore RSS indirection table** The `subchan_cleanup` label closes subchannels and resets `num_queues` to 1, but the RSS indirection table (`hv->rss_ind[]`) was already updated to distribute across the new (failed) queue count: ```c for (i = 0; i < NDIS_HASH_INDCNT; i++) hv->rss_ind[i] = i % dev->data->nb_rx_queues; ``` If `nb_rx_queues` was e.g. 4 and subchannel allocation failed, the indirection table points to queues 0-3 but only queue 0 exists. The indirection table should be reset to all-zeros in the cleanup path. **(W2) `hn_dev_stop`: TX drain loop accesses `dev->data->tx_queues[i]` up to `hv->num_queues` without bounds check against configured queue count** ```c for (i = 0; i < hv->num_queues; i++) { struct hn_tx_queue *txq = dev->data->tx_queues[i]; if (txq == NULL) continue; ... ``` `hv->num_queues` could be larger than the actual number of configured Tx queues (`dev->data->nb_tx_queues`) if configuration reduced the count but `num_queues` wasn't yet updated. The NULL check provides some protection, but iterating with `RTE_MIN(hv->num_queues, dev->data->nb_tx_queues)` would be safer. **(W3) Missing `Cc: [email protected]` and `Fixes:` tag for the `eth_hn_dev_uninit` subchannel leak fix** The commit message states this change prevents resource leaks on device removal ("close subchannels before primary channel"). This is a bug fix for existing code, not just infrastructure for the new feature. It should have a `Fixes:` tag referencing the commit that added subchannel support, and `Cc: [email protected]` for backport consideration. Alternatively, this could be split into a separate bug-fix patch. **(W4) `hn_dev_configure`: recovery path uses deeply nested if/else with duplicated error logging** The `reinit_failed` block has three levels of nesting with repeated "recovery failed, device unusable" messages. Consider restructuring with early-return or goto to reduce nesting and deduplicate the error messages. **(W5) Comment style: multi-line comment at `reinit_failed` doesn't follow DPDK style** ```c /* Device is in a broken state after failed reinit. * Try to re-establish minimal connectivity. */ ``` DPDK style requires the first line of a multi-line comment to contain only `/*`: ```c /* * Device is in a broken state after failed reinit. * Try to re-establish minimal connectivity. */ ``` --- ### Summary The most critical findings are E3 (lost RNDIS offload config after teardown/reinit — the device will come back with default offload settings rather than what was requested), E4 (read lock insufficient for queue pointer NULLing — concurrent Tx/Rx can use-after-free), and E1 (recovery path after failed remap operates on unmapped device). E2 needs verification of the allocation source for `channels[0]`. The overall approach of full NVS teardown/reinit for queue count changes is sound, and the TX drain in `hn_dev_stop` and subchannel cleanup in `eth_hn_dev_uninit` are good improvements. The error handling is thorough for the most part — the recovery attempt in `reinit_failed` is a nice touch even if the unmapped-device issue needs fixing.

