On Tue, 16 Jun 2026 05:31:58 -0700 Wei Hu <[email protected]> wrote:
> teardown immediately (dev_stop, secondary IPC, dev_close, MR cache > free) before waiting for the hardware recovery timer to fire. This > avoids blocking the EAL interrupt thread on multi-second IPC > timeouts and ibverbs calls. After the recovery delay, the thread > unregisters the interrupt handler, re-probes the PCI device, > reinitializes MR caches, and restarts queues. Each function owns > its own lock scope with no lock hand-off between threads. > > Each queue has an atomic burst_state variable where bit 0 is the > in-burst flag and bit 1 is a blocked flag. The data path uses a > single compare-and-swap (0 to 1) to enter a burst, which fails > immediately if the blocked bit is set. The reset path sets the > blocked bit via atomic fetch-or and polls bit 0 to wait for > in-flight bursts to drain. This single-variable design avoids the > need for sequential consistency ordering. > > A per-device mutex serializes the reset path with ethdev > operations. The mutex uses PTHREAD_PROCESS_SHARED for multi-process > support and is held across blocking IB verbs calls. A trylock > helper encapsulates the lock acquisition and device state check > for all ethdev operation wrappers. Operations that cannot wait > (configure, queue setup) return -EBUSY during reset, while > dev_stop and dev_close join the reset thread before acquiring > the lock to ensure proper sequencing. > > The reset thread keeps reset_thread_active true throughout its > lifetime. mana_join_reset_thread uses rte_thread_equal to detect > the self-join case (when a recovery callback calls dev_stop or > dev_close from the reset thread itself) and calls > rte_thread_detach instead of join, so thread resources are freed > on exit. External callers join normally. > > The condvar wait in the reset thread uses a predicate loop that > checks dev_state under reset_cond_mutex, so a PCI remove signal > that arrives before the thread enters the wait is not lost. The > PCI remove callback sets dev_state to RESET_FAILED under the > same mutex before signaling. A lock/unlock barrier on > reset_ops_lock in the PCI remove path ensures teardown has > completed before emitting the INTR_RMV event. > > Multi-process support is included: secondary processes unmap and > remap doorbell pages via IPC during the reset enter and exit > phases. The secondary RESET_EXIT handler closes the received fd > unconditionally after processing, even when the doorbell page is > already mapped. Data path functions in both primary and secondary > processes check the device state atomically and return early when > the device is not active. > > The driver emits RTE_ETH_EVENT_ERR_RECOVERING before entering the > reset path so that upper layers (e.g. netvsc) can switch their > data path before queues are stopped. The event is emitted outside > the reset lock to avoid deadlock if the callback calls dev_stop or > dev_close. On completion, the driver emits RECOVERY_SUCCESS or > RECOVERY_FAILED after releasing the lock. If a recovery callback > triggers dev_stop or dev_close, the self-join detection in > mana_join_reset_thread detaches the thread to avoid deadlock. If > the enter phase fails internally, RECOVERY_FAILED is sent > immediately so the application receives a terminal event. A PCI > device removal event callback distinguishes hot-remove from > service reset. > > Documentation for the device reset feature is added in the MANA > NIC guide and the 26.07 release notes. > > Signed-off-by: Wei Hu <[email protected]> > --- Applied to next-net

