On Fri, 20 Feb 2026 18:45:21 -0800 [email protected] wrote: > From: Long Li <[email protected]> > > Netvsc gets notification from VSP on VF add/remove over VMBUS, but the > timing may not match the DPDK sequence of device events triggered from > uevents from kernel. > > Remove the retry logic from the code when attach to VF and rely on DPDK > event to attach to VF. With this change, both the notifications from VSP > and the DPDK will attempt a VF attach. > > Also implement locking when checking on all VF related fields. > > Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove") > Cc: [email protected] > > Signed-off-by: Long Li <[email protected]>
AI review spotted related issue. **Patch 2 (net/netvsc: fix race conditions on VF add/remove events)** — the most complex patch in the series. **What it fixes correctly:** The old Tx/Rx paths had a TOCTOU race: they checked `vf_vsc_switched` without the lock, acquired the lock, then re-checked. A VF remove could complete between the first check and the lock acquisition. The new code takes the read lock *before* any VF state checks — correct fix. The lock is properly released on both paths. The upgrade of `hn_vf_close()` from read lock to write lock is also a real bug fix, since it modifies `vf_attached` and calls `rte_eth_dev_close()`. Moving callback registration into `hn_vf_attach()` with proper rollback (via the new `hn_vf_detach()` helper) is a good structural improvement that ties callback lifetime to VF attach/detach lifecycle. The unconditional clear of `vf_vsc_switched` in `hn_vf_remove_unlocked()` is correct — if the VF is being removed, the switched flag must be cleared regardless of whether `hn_nvs_set_datapath(SYNTHETIC)` succeeded. **One potential concern (~60% confidence):** If the VF is successfully configured and started but `hn_nvs_set_datapath(VF)` fails at the `switch_data_path:` label, the function returns an error but leaves the VF started and attached. The callers don't clean this up. This is a pre-existing design issue the patch doesn't worsen, and the hypervisor may retry, but it could confuse subsequent add/remove cycles.

