On Fri, 20 Feb 2026 18:45:21 -0800
[email protected] wrote:

> From: Long Li <[email protected]>
> 
> Netvsc gets notification from VSP on VF add/remove over VMBUS, but the
> timing may not match the DPDK sequence of device events triggered from
> uevents from kernel.
> 
> Remove the retry logic from the code when attach to VF and rely on DPDK
> event to attach to VF. With this change, both the notifications from VSP
> and the DPDK will attempt a VF attach.
> 
> Also implement locking when checking on all VF related fields.
> 
> Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> Cc: [email protected]
> 
> Signed-off-by: Long Li <[email protected]>

AI review spotted related issue.

**Patch 2 (net/netvsc: fix race conditions on VF add/remove events)** — the 
most complex patch in the series.

**What it fixes correctly:**

The old Tx/Rx paths had a TOCTOU race: they checked `vf_vsc_switched` without 
the lock, acquired the lock, then re-checked. A VF remove could complete 
between the first check and the lock acquisition. The new code takes the read 
lock *before* any VF state checks — correct fix. The lock is properly released 
on both paths.

The upgrade of `hn_vf_close()` from read lock to write lock is also a real bug 
fix, since it modifies `vf_attached` and calls `rte_eth_dev_close()`.

Moving callback registration into `hn_vf_attach()` with proper rollback (via 
the new `hn_vf_detach()` helper) is a good structural improvement that ties 
callback lifetime to VF attach/detach lifecycle.

The unconditional clear of `vf_vsc_switched` in `hn_vf_remove_unlocked()` is 
correct — if the VF is being removed, the switched flag must be cleared 
regardless of whether `hn_nvs_set_datapath(SYNTHETIC)` succeeded.

**One potential concern (~60% confidence):**

If the VF is successfully configured and started but `hn_nvs_set_datapath(VF)` 
fails at the `switch_data_path:` label, the function returns an error but 
leaves the VF started and attached. The callers don't clean this up. This is a 
pre-existing design issue the patch doesn't worsen, and the hypervisor may 
retry, but it could confuse subsequent add/remove cycles.

Reply via email to