> Subject: [PATCH v5] net/iavf: fix duplicate VF reset during PF reset recovery
> 
> During PF initiated reset recovery, iavf_dev_close() sending
> an extra VIRTCHNL_OP_RESET_VF while recovery is already in progress.
> That second reset can leave PF/VF virtchnl state inconsistent and
> cause VIRTCHNL_OP_CONFIG_VSI_QUEUES to fail with ERR_PARAM after
> ToR link flap/power-cycle, leaving the VF unable to recover.
> This results in connection loss.
> 
> This patch introduces a new flag 'pf_reset_in_progress', that is
> set only when iavf_handle_hw_reset() is entered with
> vf_initiated_reset as false and is cleared on exit.
> Also, close-time VF reset and related close-time virtchnl
> operations are skipped when PF triggered reset recovery is set.
> This is done to avoid a duplicate VF reset, and keep normal
> behavior for application-driven close or VF-initiated reinit.
> 
> Fixes: 675a104e2e94 ("net/iavf: fix abnormal disable HW interrupt")
> Fixes: b34fe66ea893 ("net/iavf: delay VF reset command")
> Fixes: 5e03e316c753 ("net/iavf: handle virtchnl event message without
> interrupt")
> Cc: [email protected]
> 
> Signed-off-by: Anurag Mandal <[email protected]>

Acked-by: Ciara Loftus <[email protected]>

I think you may need to respin due to patch application failure.
I have some suggestions for improving the comments/release notes
that you could include in the next version. Code looks good to me.

> ---
> V5: Addressed Ciara Loftus's comments
>  - added separate flag for PF initiated reset recovery
> V4: Addressed Ciara Loftus's comments
>  - split VF reset from other code changes
> V3: Addressed latest ai-code-review comments
> V2: Addressed ai-code-review comments
> 
>  doc/guides/rel_notes/release_26_07.rst |  3 ++
>  drivers/net/intel/iavf/iavf.h          |  7 +++++
>  drivers/net/intel/iavf/iavf_ethdev.c   | 40 +++++++++++++++-----------
>  drivers/net/intel/iavf/iavf_vchnl.c    | 18 ++++++++++--
>  4 files changed, 49 insertions(+), 19 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/release_26_07.rst
> b/doc/guides/rel_notes/release_26_07.rst
> index d2563ac503..f6899a78c3 100644
> --- a/doc/guides/rel_notes/release_26_07.rst
> +++ b/doc/guides/rel_notes/release_26_07.rst
> @@ -95,6 +95,9 @@ New Features
> 
>    * Added support for transmitting LLDP packets based on mbuf packet type.
>    * Implemented AVX2 context descriptor transmit paths.
> +  * Prevented duplicate 'VIRTCHNL_OP_RESET_VF' during a PF-initiated
> +    reset recovery, which earlier caused virtchnl state corruption
> +    and connection loss after a top-of-rack (ToR) link flap/power-cycle.

I think something more concise here would be better.
eg. "Fixed duplicate send of 'VIRTCHNL_OP_RESET_VF' during PF reset
recovery which could cause virtchnl state corruption"

> 
>  * **Updated PCAP ethernet driver.**
> 
> diff --git a/drivers/net/intel/iavf/iavf.h b/drivers/net/intel/iavf/iavf.h
> index 2615b6f034..67aacbe7a6 100644
> --- a/drivers/net/intel/iavf/iavf.h
> +++ b/drivers/net/intel/iavf/iavf.h
> @@ -292,6 +292,13 @@ struct iavf_info {
> 
>       bool in_reset_recovery;
> 
> +     /*
> +      * Set only while iavf_handle_hw_reset()
> +      * is processing a PF-initiated reset
> +      * (vf_initiated_reset == false).
> +      */

I don't think a comment is warranted here, the variable name is
self-explanatory.

> +     bool pf_reset_in_progress;
> +
>       uint32_t ptp_caps;
>       rte_spinlock_t phc_time_aq_lock;
>  };
> diff --git a/drivers/net/intel/iavf/iavf_ethdev.c
> b/drivers/net/intel/iavf/iavf_ethdev.c
> index a8031e23a5..2b6f4daa99 100644
> --- a/drivers/net/intel/iavf/iavf_ethdev.c
> +++ b/drivers/net/intel/iavf/iavf_ethdev.c
> @@ -3166,23 +3166,27 @@ iavf_dev_close(struct rte_eth_dev *dev)
> 
>       ret = iavf_dev_stop(dev);
> 
> -     /*
> -      * Release redundant queue resource when close the dev
> -      * so that other vfs can re-use the queues.
> -      */
> -     if (vf->lv_enabled) {
> -             ret = iavf_request_queues(dev,
> IAVF_MAX_NUM_QUEUES_DFLT);
> -             if (ret)
> -                     PMD_DRV_LOG(ERR, "Reset the num of queues
> failed");
> +     /* Skip RESET_VF on a PF-initiated reset */

Regarding the comment above, here we're not skipping RESET_VF rather
preventing sending virtchnl messages to the adminq during the PF-initiated
reset. I suggest rewording the comment to reflect that.

> +     if (!vf->pf_reset_in_progress) {
> 
> -             vf->max_rss_qregion = IAVF_MAX_NUM_QUEUES_DFLT;
> -     }
> +             /*
> +              * Release redundant queue resource when close the dev
> +              * so that other vfs can re-use the queues.
> +              */
> +             if (vf->lv_enabled) {
> +                     ret = iavf_request_queues(dev,
> IAVF_MAX_NUM_QUEUES_DFLT);
> +                     if (ret)
> +                             PMD_DRV_LOG(ERR, "Reset the num of
> queues failed");
> +                     vf->max_rss_qregion =

Reply via email to