RE: [EXTERNAL] Re: [PATCH] net/netvsc: support runtime queue count reconfiguration

Long Li Wed, 25 Feb 2026 20:18:06 -0800

> Subject: [EXTERNAL] Re: [PATCH] net/netvsc: support runtime queue count
> reconfiguration
> 
> On Wed, 25 Feb 2026 11:50:40 -0800
> Long Li <[email protected]> wrote:
> 
> > Add support for changing the number of RX/TX queues at runtime via
> > port stop/configure/start. When the queue count changes, perform a
> > full NVS/RNDIS teardown and reinit to allocate fresh VMBus subchannels
> > matching the new queue count, then reconfigure RSS indirection table
> > accordingly.
> >
> > Key changes:
> > - hn_dev_configure: detect queue count changes and perform full
> >   NVS session reinit with subchannel teardown/recreation
> > - hn_dev_stop: drain pending TX completions (up to 1s) to prevent
> >   stale completions from corrupting queue state after reconfig
> > - eth_hn_dev_uninit: close subchannels before primary channel to
> >   prevent resource leaks on device removal
> > - hn_vf_tx/rx_queue_release: null VF queue pointers after release
> >   to prevent use-after-free when the ethdev framework releases
> >   excess queues during VF reconfiguration
> >
> > Signed-off-by: Long Li <[email protected]>
> > ---
> 
> Sorry for AI review overload here...


E2 is not correct, all others make sense.

I'm sending v2, with two patches. One original patch, the other patch suggested 
by W3 with a "fixes" tag.

Thanks,
Long

> 
> ## Patch Review: `[PATCH] net/netvsc: support runtime queue count
> reconfiguration`
> 
> 
> 
> ### Errors
> 
> **(E1) `hn_dev_configure`: `rte_vmbus_unmap_device` after closing subchannels
> and primary channel — subsequent `rte_vmbus_map_device` failure leaves
> device in unrecoverable state with no channel and no mapping**
> 
> The reinit sequence is:
> 
> ```c
> hn_detach(hv);
> rte_vmbus_chan_close(hv->channels[0]);
> rte_free(hv->channels[0]);
> hv->channels[0] = NULL;
> 
> rte_vmbus_unmap_device(hv->vmbus);
> err = rte_vmbus_map_device(hv->vmbus);
> if (err) {
>     PMD_DRV_LOG(ERR, "Could not re-map vmbus device!");
>     goto reinit_failed;
> }
> ```
> 
> If `rte_vmbus_map_device` fails, execution goes to `reinit_failed`, which 
> attempts
> recovery by calling `rte_vmbus_chan_open`. But the device is unmapped at this
> point — the MMIO regions are gone. `rte_vmbus_chan_open` on an unmapped
> device is undefined behavior (it will try to access unmapped ring buffers). 
> The
> recovery path cannot work without a successful map. The recovery block should
> check whether the device is mapped before attempting channel open, or re-
> attempt the map first.
> 
> **(E2) `hn_dev_configure`: `rte_free(hv->channels[0])` frees memory not
> allocated with `rte_malloc`**
> 
> After closing the primary channel:
> 
> ```c
> rte_vmbus_chan_close(hv->channels[0]);
> rte_free(hv->channels[0]);
> hv->channels[0] = NULL;
> ```
> 
> And similarly in the `hn_attach` failure recovery:
> 
> ```c
> rte_vmbus_chan_close(hv->channels[0]);
> rte_free(hv->channels[0]);
> hv->channels[0] = NULL;
> ```
> 
> `channels[0]` is allocated by `rte_vmbus_chan_open()`. Whether this is
> `rte_malloc`'d or regular `malloc`'d depends on the vmbus implementation. If
> `rte_vmbus_chan_open` uses `malloc` internally (which is common for control
> structures), then `rte_free` is a mismatch that may silently corrupt the 
> heap. Need
> to verify the allocation source matches — or use the same free that
> `rte_vmbus_chan_close` would use. In `eth_hn_dev_uninit` (existing code), the
> primary channel is freed via `rte_free(hv->primary)`, not `rte_free(hv-
> >channels[0])`, suggesting these may be different pointers or the allocation 
> >path
> uses `rte_malloc`. Worth verifying.
> 
> **(E3) `hn_dev_configure`: RNDIS offload configuration applied before queue
> count change check, not re-applied after NVS teardown/reinit**
> 
> The sequence is:
> 
> ```c
> err = hn_rndis_conf_offload(hv, ...);  /* Configure RNDIS offload */ ...
> if (queue count unchanged)
>     goto skip_reinit;
> 
> /* Close subchannels, hn_detach(), chan_close, chan_open, hn_attach() */ ```
> 
> `hn_detach()` tears down the RNDIS session. `hn_attach()` establishes a new
> NVS/RNDIS session. But `hn_rndis_conf_offload()` was called *before* the
> teardown and is not called again after `hn_attach()`. The offload 
> configuration
> from the first call is lost when the RNDIS session is destroyed. The offload
> configuration should be re-applied after `hn_attach()`, or moved to after the 
> reinit
> block.
> 
> **(E4) `hn_vf_tx_queue_release` / `hn_vf_rx_queue_release`: setting queue
> pointer to NULL under read lock while Tx/Rx paths read the same pointer under
> read lock**
> 
> ```c
> rte_rwlock_read_lock(&hv->vf_lock);
> vf_dev = hn_get_vf_dev(hv);
> if (vf_dev && vf_dev->dev_ops->tx_queue_release) {
>     (*vf_dev->dev_ops->tx_queue_release)(vf_dev, queue_id);
>     vf_dev->data->tx_queues[queue_id] = NULL; } rte_rwlock_read_unlock(&hv-
> >vf_lock);
> ```
> 
> The fast path (`hn_xmit_pkts`) reads `vf_dev->data->tx_queues[queue_id]` under
> the same read lock:
> 
> ```c
> void *sub_q = vf_dev->data->tx_queues[queue_id];
> nb_tx = (*vf_dev->tx_pkt_burst)(sub_q, tx_pkts, nb_pkts); ```
> 
> Multiple readers can hold a rwlock simultaneously. This means a Tx thread 
> could
> read the queue pointer, then the release thread NULLs it and frees the 
> underlying
> memory, and the Tx thread calls into freed memory. The NULL assignment after
> release prevents *subsequent* accesses, but doesn't protect a concurrent 
> reader
> that already loaded the pointer. This needs a write lock, not a read lock.
> 
> ### Warnings
> 
> **(W1) `hn_dev_configure`: `subchan_cleanup` error path sets `hv->num_queues
> = 1` but doesn't restore RSS indirection table**
> 
> The `subchan_cleanup` label closes subchannels and resets `num_queues` to 1,
> but the RSS indirection table (`hv->rss_ind[]`) was already updated to 
> distribute
> across the new (failed) queue count:
> 
> ```c
> for (i = 0; i < NDIS_HASH_INDCNT; i++)
>     hv->rss_ind[i] = i % dev->data->nb_rx_queues; ```
> 
> If `nb_rx_queues` was e.g. 4 and subchannel allocation failed, the indirection
> table points to queues 0-3 but only queue 0 exists. The indirection table 
> should be
> reset to all-zeros in the cleanup path.
> 
> **(W2) `hn_dev_stop`: TX drain loop accesses `dev->data->tx_queues[i]` up to
> `hv->num_queues` without bounds check against configured queue count**
> 
> ```c
> for (i = 0; i < hv->num_queues; i++) {
>     struct hn_tx_queue *txq = dev->data->tx_queues[i];
>     if (txq == NULL)
>         continue;
>     ...
> ```
> 
> `hv->num_queues` could be larger than the actual number of configured Tx
> queues (`dev->data->nb_tx_queues`) if configuration reduced the count but
> `num_queues` wasn't yet updated. The NULL check provides some protection,
> but iterating with `RTE_MIN(hv->num_queues, dev->data->nb_tx_queues)` would
> be safer.
> 
> **(W3) Missing `Cc: [email protected]` and `Fixes:` tag for the
> `eth_hn_dev_uninit` subchannel leak fix**
> 
> The commit message states this change prevents resource leaks on device
> removal ("close subchannels before primary channel"). This is a bug fix for
> existing code, not just infrastructure for the new feature. It should have a 
> `Fixes:`
> tag referencing the commit that added subchannel support, and `Cc:
> [email protected]` for backport consideration. Alternatively, this could be 
> split into
> a separate bug-fix patch.
> 
> **(W4) `hn_dev_configure`: recovery path uses deeply nested if/else with
> duplicated error logging**
> 
> The `reinit_failed` block has three levels of nesting with repeated "recovery 
> failed,
> device unusable" messages. Consider restructuring with early-return or goto to
> reduce nesting and deduplicate the error messages.
> 
> **(W5) Comment style: multi-line comment at `reinit_failed` doesn't follow 
> DPDK
> style**
> 
> ```c
>       /* Device is in a broken state after failed reinit.
>        * Try to re-establish minimal connectivity.
>        */
> ```
> 
> DPDK style requires the first line of a multi-line comment to contain only 
> `/*`:
> 
> ```c
>       /*
>        * Device is in a broken state after failed reinit.
>        * Try to re-establish minimal connectivity.
>        */
> ```
> 
> ---
> 
> ### Summary
> 
> The most critical findings are E3 (lost RNDIS offload config after 
> teardown/reinit
> — the device will come back with default offload settings rather than what was
> requested), E4 (read lock insufficient for queue pointer NULLing — concurrent
> Tx/Rx can use-after-free), and E1 (recovery path after failed remap operates 
> on
> unmapped device). E2 needs verification of the allocation source for
> `channels[0]`.
> 
> The overall approach of full NVS teardown/reinit for queue count changes is
> sound, and the TX drain in `hn_dev_stop` and subchannel cleanup in
> `eth_hn_dev_uninit` are good improvements. The error handling is thorough for
> the most part — the recovery attempt in `reinit_failed` is a nice touch even 
> if the
> unmapped-device issue needs fixing.

RE: [EXTERNAL] Re: [PATCH] net/netvsc: support runtime queue count reconfiguration

Reply via email to