On Tue, 16 Jun 2026 05:31:58 -0700
Wei Hu <[email protected]> wrote:

> teardown immediately (dev_stop, secondary IPC, dev_close, MR cache
> free) before waiting for the hardware recovery timer to fire. This
> avoids blocking the EAL interrupt thread on multi-second IPC
> timeouts and ibverbs calls. After the recovery delay, the thread
> unregisters the interrupt handler, re-probes the PCI device,
> reinitializes MR caches, and restarts queues. Each function owns
> its own lock scope with no lock hand-off between threads.
> 
> Each queue has an atomic burst_state variable where bit 0 is the
> in-burst flag and bit 1 is a blocked flag. The data path uses a
> single compare-and-swap (0 to 1) to enter a burst, which fails
> immediately if the blocked bit is set. The reset path sets the
> blocked bit via atomic fetch-or and polls bit 0 to wait for
> in-flight bursts to drain. This single-variable design avoids the
> need for sequential consistency ordering.
> 
> A per-device mutex serializes the reset path with ethdev
> operations. The mutex uses PTHREAD_PROCESS_SHARED for multi-process
> support and is held across blocking IB verbs calls. A trylock
> helper encapsulates the lock acquisition and device state check
> for all ethdev operation wrappers. Operations that cannot wait
> (configure, queue setup) return -EBUSY during reset, while
> dev_stop and dev_close join the reset thread before acquiring
> the lock to ensure proper sequencing.
> 
> The reset thread keeps reset_thread_active true throughout its
> lifetime. mana_join_reset_thread uses rte_thread_equal to detect
> the self-join case (when a recovery callback calls dev_stop or
> dev_close from the reset thread itself) and calls
> rte_thread_detach instead of join, so thread resources are freed
> on exit. External callers join normally.
> 
> The condvar wait in the reset thread uses a predicate loop that
> checks dev_state under reset_cond_mutex, so a PCI remove signal
> that arrives before the thread enters the wait is not lost. The
> PCI remove callback sets dev_state to RESET_FAILED under the
> same mutex before signaling. A lock/unlock barrier on
> reset_ops_lock in the PCI remove path ensures teardown has
> completed before emitting the INTR_RMV event.
> 
> Multi-process support is included: secondary processes unmap and
> remap doorbell pages via IPC during the reset enter and exit
> phases. The secondary RESET_EXIT handler closes the received fd
> unconditionally after processing, even when the doorbell page is
> already mapped. Data path functions in both primary and secondary
> processes check the device state atomically and return early when
> the device is not active.
> 
> The driver emits RTE_ETH_EVENT_ERR_RECOVERING before entering the
> reset path so that upper layers (e.g. netvsc) can switch their
> data path before queues are stopped. The event is emitted outside
> the reset lock to avoid deadlock if the callback calls dev_stop or
> dev_close. On completion, the driver emits RECOVERY_SUCCESS or
> RECOVERY_FAILED after releasing the lock. If a recovery callback
> triggers dev_stop or dev_close, the self-join detection in
> mana_join_reset_thread detaches the thread to avoid deadlock. If
> the enter phase fails internally, RECOVERY_FAILED is sent
> immediately so the application receives a terminal event. A PCI
> device removal event callback distinguishes hot-remove from
> service reset.
> 
> Documentation for the device reset feature is added in the MANA
> NIC guide and the 26.07 release notes.
> 
> Signed-off-by: Wei Hu <[email protected]>
> ---
Applied to next-net

Reply via email to