Re: [PATCH] net/netvsc: support runtime queue count reconfiguration

Stephen Hemminger Wed, 25 Feb 2026 14:41:55 -0800

On Wed, 25 Feb 2026 11:50:40 -0800
Long Li <[email protected]> wrote:


> Add support for changing the number of RX/TX queues at runtime
> via port stop/configure/start. When the queue count changes,
> perform a full NVS/RNDIS teardown and reinit to allocate fresh
> VMBus subchannels matching the new queue count, then reconfigure
> RSS indirection table accordingly.
> 
> Key changes:
> - hn_dev_configure: detect queue count changes and perform full
>   NVS session reinit with subchannel teardown/recreation
> - hn_dev_stop: drain pending TX completions (up to 1s) to prevent
>   stale completions from corrupting queue state after reconfig
> - eth_hn_dev_uninit: close subchannels before primary channel to
>   prevent resource leaks on device removal
> - hn_vf_tx/rx_queue_release: null VF queue pointers after release
>   to prevent use-after-free when the ethdev framework releases
>   excess queues during VF reconfiguration
> 
> Signed-off-by: Long Li <[email protected]>
> ---

Sorry for AI review overload here...

## Patch Review: `[PATCH] net/netvsc: support runtime queue count 
reconfiguration`



### Errors

**(E1) `hn_dev_configure`: `rte_vmbus_unmap_device` after closing subchannels 
and primary channel — subsequent `rte_vmbus_map_device` failure leaves device 
in unrecoverable state with no channel and no mapping**

The reinit sequence is:

```c
hn_detach(hv);
rte_vmbus_chan_close(hv->channels[0]);
rte_free(hv->channels[0]);
hv->channels[0] = NULL;

rte_vmbus_unmap_device(hv->vmbus);
err = rte_vmbus_map_device(hv->vmbus);
if (err) {
    PMD_DRV_LOG(ERR, "Could not re-map vmbus device!");
    goto reinit_failed;
}
```

If `rte_vmbus_map_device` fails, execution goes to `reinit_failed`, which 
attempts recovery by calling `rte_vmbus_chan_open`. But the device is unmapped 
at this point — the MMIO regions are gone. `rte_vmbus_chan_open` on an unmapped 
device is undefined behavior (it will try to access unmapped ring buffers). The 
recovery path cannot work without a successful map. The recovery block should 
check whether the device is mapped before attempting channel open, or 
re-attempt the map first.

**(E2) `hn_dev_configure`: `rte_free(hv->channels[0])` frees memory not 
allocated with `rte_malloc`**

After closing the primary channel:

```c
rte_vmbus_chan_close(hv->channels[0]);
rte_free(hv->channels[0]);
hv->channels[0] = NULL;
```

And similarly in the `hn_attach` failure recovery:

```c
rte_vmbus_chan_close(hv->channels[0]);
rte_free(hv->channels[0]);
hv->channels[0] = NULL;
```

`channels[0]` is allocated by `rte_vmbus_chan_open()`. Whether this is 
`rte_malloc`'d or regular `malloc`'d depends on the vmbus implementation. If 
`rte_vmbus_chan_open` uses `malloc` internally (which is common for control 
structures), then `rte_free` is a mismatch that may silently corrupt the heap. 
Need to verify the allocation source matches — or use the same free that 
`rte_vmbus_chan_close` would use. In `eth_hn_dev_uninit` (existing code), the 
primary channel is freed via `rte_free(hv->primary)`, not 
`rte_free(hv->channels[0])`, suggesting these may be different pointers or the 
allocation path uses `rte_malloc`. Worth verifying.

**(E3) `hn_dev_configure`: RNDIS offload configuration applied before queue 
count change check, not re-applied after NVS teardown/reinit**

The sequence is:

```c
err = hn_rndis_conf_offload(hv, ...);  /* Configure RNDIS offload */
...
if (queue count unchanged)
    goto skip_reinit;

/* Close subchannels, hn_detach(), chan_close, chan_open, hn_attach() */
```

`hn_detach()` tears down the RNDIS session. `hn_attach()` establishes a new 
NVS/RNDIS session. But `hn_rndis_conf_offload()` was called *before* the 
teardown and is not called again after `hn_attach()`. The offload configuration 
from the first call is lost when the RNDIS session is destroyed. The offload 
configuration should be re-applied after `hn_attach()`, or moved to after the 
reinit block.

**(E4) `hn_vf_tx_queue_release` / `hn_vf_rx_queue_release`: setting queue 
pointer to NULL under read lock while Tx/Rx paths read the same pointer under 
read lock**

```c
rte_rwlock_read_lock(&hv->vf_lock);
vf_dev = hn_get_vf_dev(hv);
if (vf_dev && vf_dev->dev_ops->tx_queue_release) {
    (*vf_dev->dev_ops->tx_queue_release)(vf_dev, queue_id);
    vf_dev->data->tx_queues[queue_id] = NULL;
}
rte_rwlock_read_unlock(&hv->vf_lock);
```

The fast path (`hn_xmit_pkts`) reads `vf_dev->data->tx_queues[queue_id]` under 
the same read lock:

```c
void *sub_q = vf_dev->data->tx_queues[queue_id];
nb_tx = (*vf_dev->tx_pkt_burst)(sub_q, tx_pkts, nb_pkts);
```

Multiple readers can hold a rwlock simultaneously. This means a Tx thread could 
read the queue pointer, then the release thread NULLs it and frees the 
underlying memory, and the Tx thread calls into freed memory. The NULL 
assignment after release prevents *subsequent* accesses, but doesn't protect a 
concurrent reader that already loaded the pointer. This needs a write lock, not 
a read lock.

### Warnings

**(W1) `hn_dev_configure`: `subchan_cleanup` error path sets `hv->num_queues = 
1` but doesn't restore RSS indirection table**

The `subchan_cleanup` label closes subchannels and resets `num_queues` to 1, 
but the RSS indirection table (`hv->rss_ind[]`) was already updated to 
distribute across the new (failed) queue count:

```c
for (i = 0; i < NDIS_HASH_INDCNT; i++)
    hv->rss_ind[i] = i % dev->data->nb_rx_queues;
```

If `nb_rx_queues` was e.g. 4 and subchannel allocation failed, the indirection 
table points to queues 0-3 but only queue 0 exists. The indirection table 
should be reset to all-zeros in the cleanup path.

**(W2) `hn_dev_stop`: TX drain loop accesses `dev->data->tx_queues[i]` up to 
`hv->num_queues` without bounds check against configured queue count**

```c
for (i = 0; i < hv->num_queues; i++) {
    struct hn_tx_queue *txq = dev->data->tx_queues[i];
    if (txq == NULL)
        continue;
    ...
```

`hv->num_queues` could be larger than the actual number of configured Tx queues 
(`dev->data->nb_tx_queues`) if configuration reduced the count but `num_queues` 
wasn't yet updated. The NULL check provides some protection, but iterating with 
`RTE_MIN(hv->num_queues, dev->data->nb_tx_queues)` would be safer.

**(W3) Missing `Cc: [email protected]` and `Fixes:` tag for the 
`eth_hn_dev_uninit` subchannel leak fix**

The commit message states this change prevents resource leaks on device removal 
("close subchannels before primary channel"). This is a bug fix for existing 
code, not just infrastructure for the new feature. It should have a `Fixes:` 
tag referencing the commit that added subchannel support, and `Cc: 
[email protected]` for backport consideration. Alternatively, this could be split 
into a separate bug-fix patch.

**(W4) `hn_dev_configure`: recovery path uses deeply nested if/else with 
duplicated error logging**

The `reinit_failed` block has three levels of nesting with repeated "recovery 
failed, device unusable" messages. Consider restructuring with early-return or 
goto to reduce nesting and deduplicate the error messages.

**(W5) Comment style: multi-line comment at `reinit_failed` doesn't follow DPDK 
style**

```c
        /* Device is in a broken state after failed reinit.
         * Try to re-establish minimal connectivity.
         */
```

DPDK style requires the first line of a multi-line comment to contain only `/*`:

```c
        /*
         * Device is in a broken state after failed reinit.
         * Try to re-establish minimal connectivity.
         */
```

---

### Summary

The most critical findings are E3 (lost RNDIS offload config after 
teardown/reinit — the device will come back with default offload settings 
rather than what was requested), E4 (read lock insufficient for queue pointer 
NULLing — concurrent Tx/Rx can use-after-free), and E1 (recovery path after 
failed remap operates on unmapped device). E2 needs verification of the 
allocation source for `channels[0]`.

The overall approach of full NVS teardown/reinit for queue count changes is 
sound, and the TX drain in `hn_dev_stop` and subchannel cleanup in 
`eth_hn_dev_uninit` are good improvements. The error handling is thorough for 
the most part — the recovery attempt in `reinit_failed` is a nice touch even if 
the unmapped-device issue needs fixing.

Re: [PATCH] net/netvsc: support runtime queue count reconfiguration

Reply via email to