On Fri, 20 Feb 2026 18:45:22 -0800
[email protected] wrote:

> From: Long Li <[email protected]>
> 
> When a VF device is hot-removed by the primary process, secondary
> processes must be notified to release their references to the VF port.
> Without this, secondary processes retain stale port references leading
> to crashes or undefined behavior when accessing the removed device.
> 
> This patch adds multi-process communication infrastructure to coordinate
> VF removal across all processes:
> 
> - Shared memory (netvsc_shared_data) to track secondary process count
> - Multi-process message handlers (NETVSC_MP_REQ_VF_REMOVE) to notify
>   secondaries when primary removes a VF device
> - Secondary handler calls rte_eth_dev_release_port() to cleanly release
>   the VF port in its own process space
> - Primary waits for all secondaries to acknowledge removal before
>   proceeding
> 
> The implementation uses rte_mp_request_sync() to ensure all secondary
> processes respond within NETVSC_MP_REQ_TIMEOUT_SEC (5 seconds) before
> the primary completes the VF removal sequence.
> 
> Fixes: 7fc4c0997b04 ("net/netvsc: fix hot adding multiple VF PCI devices")
> Cc: [email protected]
> 
> Signed-off-by: Long Li <[email protected]>

AI review feedback:

**Patch 3 (net/netvsc: add multi-process VF device removal support)** — adds MP 
infrastructure to coordinate VF removal across processes.

**Three concerns:**

1. **Race window on `secondary_cnt` during probe (~50% confidence).** The 
secondary increments `secondary_cnt` *after* `rte_eth_dev_probing_finish()`, 
but `netvsc_init_once()` and device setup happen before that. A primary 
removing a VF during this window sees `secondary_cnt == 0`, skips 
`rte_mp_request_sync()`, and the secondary never gets notified — leaving it 
with a stale VF port reference.

2. **Misleading "VF is already locked by primary" comment.** In 
`netvsc_secondary_handle_device_remove()`, the code reads `hv->vf_ctx.vf_port` 
from shared memory with a comment saying the primary's lock protects it. But 
`rte_rwlock_t` is process-local — it doesn't work cross-process. The actual 
synchronization comes from the MP message exchange itself (the primary sends 
the message after setting state, the secondary handles it after receiving). The 
comment should reflect that.

3. **`netvsc_init_once()` not protected by the spinlock.** It's called from 
`eth_hn_probe()` without `netvsc_shared_data_lock`, while 
`netvsc_uninit_once()` is called *inside* the lock. If two netvsc devices probe 
concurrently in the same process, the `init_done` flag check could race. Low 
risk since DPDK probe is typically single-threaded, but inconsistent with the 
uninit path.

**Minor style notes:** `MZ_NETVSC_SHARED_DATA` uses macro-style naming but is a 
`const char *` variable — could be a `#define` for consistency with 
`NETVSC_MP_NAME`. The stub `netvsc_mp_primary_handle()` that always returns 0 
is benign but could mask future protocol errors.

**Overall:** Sound infrastructure, suitable for merging with the comment fix 
and awareness of the probe-time race window.

Reply via email to