> -----Original Message-----
> From: Stephen Hemminger <[email protected]>
> Sent: Wednesday, May 6, 2026 7:50 PM
> To: Long Li <[email protected]>
> Cc: [email protected]; Wei Hu <[email protected]>; [email protected]
> Subject: [EXTERNAL] Re: [PATCH v2 1/7] net/netvsc: retry VF hotplug 
> indefinitely
> until PCI device disappears
> 
> On Tue,  5 May 2026 19:05:22 -0700
> Long Li <[email protected]> wrote:
> 
> > After PCI rescan on Azure, the MANA kernel driver can take over 100
> > seconds to probe and create the /sys/bus/pci/devices/<dev>/net directory.
> > The previous fixed retry limit (NETVSC_MAX_HOTADD_RETRY=10, ~12
> > seconds) was insufficient, causing VF re-attach to fail with 'Failed
> > to parse PCI device' on systems with slow MANA driver initialization.
> >
> > Replace the fixed retry limit with an indefinite retry that only gives
> > up when the PCI device itself disappears from sysfs. This is safe because:
> >
> > - The retry uses rte_eal_alarm callbacks which are serialized on the EAL
> >   interrupt thread, preventing races with VF remove or device close paths.
> > - Device close (eth_hn_dev_uninit) explicitly cancels all pending hotplug
> >   alarms via rte_eal_alarm_cancel and frees the context.
> > - If the PCI device is removed while retrying, access() detects the
> >   missing sysfs path and stops immediately.
> >
> > A periodic NOTICE log every 30 retries (~30s) provides visibility into
> > long waits without flooding the log at DEBUG level.
> >
> > Fixes: a2a23a794b3a ("net/netvsc: support VF device hot add/remove")
> > Cc: [email protected]
> > Signed-off-by: Long Li <[email protected]>
> > ---
> Better but still seeing AI review warnings.

I have sent v3.

Thanks,
Long

> 
> Reviewed the v2 7-patch series against upstream drivers/net/netvsc/. Patches 
> 1,
> 2, 3, and 5 are clean. Findings on the rest:
> Patch 4 — the new "retry loop exiting" NOTICE fires on every termination
> including the success path, producing a noise alert on every successful VF re-
> attach.
> Patch 6 — two warnings: (a) reaching directly into vf_dev->dev_ops->stats_get
> works only because eth_stats_qstats_get() already memset the buffers before
> invoking netvsc's callback, an undocumented dependency on the caller; (b) the
> else fallback to rte_eth_stats_get() is dead code — it returns -ENOTSUP for 
> the
> same reason as the direct call.
> Patch 7 — the recovering and recovery_success callbacks acquire vf_lock 
> directly
> from event-callback context, departing from the existing INTR_RMV pattern that
> defers work via rte_eal_alarm_set precisely to avoid cross-driver lock-order
> assumptions. The unlocked vf_attached read in recovery_failed is a benign race
> that can be simplified by dropping the guard.

Reply via email to