On Tue,  5 May 2026 19:05:22 -0700
Long Li <[email protected]> wrote:

> After PCI rescan on Azure, the MANA kernel driver can take over 100
> seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
> The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12 seconds)
> was insufficient, causing VF re-attach to fail with 'Failed to parse PCI
> device' on systems with slow MANA driver initialization.
> 
> Replace the fixed retry limit with an indefinite retry that only gives up
> when the PCI device itself disappears from sysfs. This is safe because:
> 
> - The retry uses rte_eal_alarm callbacks which are serialized on the EAL
>   interrupt thread, preventing races with VF remove or device close paths.
> - Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
>   alarms via rte_eal_alarm_cancel and frees the context.
> - If the PCI device is removed while retrying, access() detects the
>   missing sysfs path and stops immediately.
> 
> A periodic NOTICE log every 30 retries (~30s) provides visibility into
> long waits without flooding the log at DEBUG level.
> 
> Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> Cc: [email protected]
> Signed-off-by: Long Li <[email protected]>
> ---
Better but still seeing AI review warnings.

Reviewed the v2 7-patch series against upstream drivers/net/netvsc/. Patches 1, 
2, 3, and 5 are clean. Findings on the rest:
Patch 4 — the new "retry loop exiting" NOTICE fires on every termination 
including the success path, producing a noise alert on every successful VF 
re-attach.
Patch 6 — two warnings: (a) reaching directly into vf_dev->dev_ops->stats_get 
works only because eth_stats_qstats_get() already memset the buffers before 
invoking netvsc's callback, an undocumented dependency on the caller; (b) the 
else fallback to rte_eth_stats_get() is dead code — it returns -ENOTSUP for the 
same reason as the direct call.
Patch 7 — the recovering and recovery_success callbacks acquire vf_lock 
directly from event-callback context, departing from the existing INTR_RMV 
pattern that defers work via rte_eal_alarm_set precisely to avoid cross-driver 
lock-order assumptions. The unlocked vf_attached read in recovery_failed is a 
benign race that can be simplified by dropping the guard.

Reply via email to