> -----Original Message----- > From: Stephen Hemminger <[email protected]> > Sent: Wednesday, May 6, 2026 7:50 PM > To: Long Li <[email protected]> > Cc: [email protected]; Wei Hu <[email protected]>; [email protected] > Subject: [EXTERNAL] Re: [PATCH v2 1/7] net/netvsc: retry VF hotplug > indefinitely > until PCI device disappears > > On Tue, 5 May 2026 19:05:22 -0700 > Long Li <[email protected]> wrote: > > > After PCI rescan on Azure, the MANA kernel driver can take over 100 > > seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory. > > The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12 > > seconds) was insufficient, causing VF re-attach to fail with 'Failed > > to parse PCI device' on systems with slow MANA driver initialization. > > > > Replace the fixed retry limit with an indefinite retry that only gives > > up when the PCI device itself disappears from sysfs. This is safe because: > > > > - The retry uses rte_eal_alarm callbacks which are serialized on the EAL > > interrupt thread, preventing races with VF remove or device close paths. > > - Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug > > alarms via rte_eal_alarm_cancel and frees the context. > > - If the PCI device is removed while retrying, access() detects the > > missing sysfs path and stops immediately. > > > > A periodic NOTICE log every 30 retries (~30s) provides visibility into > > long waits without flooding the log at DEBUG level. > > > > Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove") > > Cc: [email protected] > > Signed-off-by: Long Li <[email protected]> > > --- > Better but still seeing AI review warnings.
I have sent v3. Thanks, Long > > Reviewed the v2 7-patch series against upstream drivers/net/netvsc/. Patches > 1, > 2, 3, and 5 are clean. Findings on the rest: > Patch 4 — the new "retry loop exiting" NOTICE fires on every termination > including the success path, producing a noise alert on every successful VF re- > attach. > Patch 6 — two warnings: (a) reaching directly into vf_dev->dev_ops->stats_get > works only because eth_stats_qstats_get() already memset the buffers before > invoking netvsc's callback, an undocumented dependency on the caller; (b) the > else fallback to rte_eth_stats_get() is dead code — it returns -ENOTSUP for > the > same reason as the direct call. > Patch 7 — the recovering and recovery_success callbacks acquire vf_lock > directly > from event-callback context, departing from the existing INTR_RMV pattern that > defers work via rte_eal_alarm_set precisely to avoid cross-driver lock-order > assumptions. The unlocked vf_attached read in recovery_failed is a benign race > that can be simplified by dropping the guard.

