On Fri, Dec 26, 2025 at 4:36 AM Perry Yuan <[email protected]> wrote:
>
> During Mode 1 reset, the ASIC undergoes a reset cycle and becomes
> temporarily inaccessible via PCIe. Any attempt to access MMIO registers
> during this window (e.g., from interrupt handlers or other driver threads)
> can result in uncompleted PCIe transactions, leading to NMI panics or
> system hangs.
>
> To prevent this, set the `no_hw_access` flag to true immediately after
> triggering the reset. This signals other driver components to skip
> register accesses while the device is offline.
>
> A memory barrier `smp_mb()` is added to ensure the flag update is
> globally visible to all cores before the driver enters the sleep/wait
> state.

Seems like it would make sense to extend this to all asics which
support mode1 reset.

Alex

>
> Signed-off-by: Perry Yuan <[email protected]>
> Reviewed-by: Yifan Zhang <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c           | 3 +++
>  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 7 ++++++-
>  drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c | 9 +++++++--
>  3 files changed, 16 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 824c5489ec85..75b1b78c0437 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5776,6 +5776,9 @@ int amdgpu_device_mode1_reset(struct amdgpu_device 
> *adev)
>         if (ret)
>                 goto mode1_reset_failed;
>
> +       /* enable mmio access after mode 1 reset completed */
> +       adev->no_hw_access = false;
> +
>         amdgpu_device_load_pci_state(adev->pdev);
>         ret = amdgpu_psp_wait_for_bootloader(adev);
>         if (ret)
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> index 8e35d501e81d..dcb169b25916 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
> @@ -2850,8 +2850,13 @@ static int smu_v13_0_0_mode1_reset(struct smu_context 
> *smu)
>                 break;
>         }
>
> -       if (!ret)
> +       if (!ret) {
> +               /* disable mmio access while doing mode 1 reset*/
> +               smu->adev->no_hw_access = true;
> +               /* ensure no_hw_access is globally visible before any MMIO */
> +               smp_mb();
>                 msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
> +       }
>
>         return ret;
>  }
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> index af1bc7b4350b..b1016debdf06 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> @@ -2069,10 +2069,15 @@ static int smu_v14_0_2_mode1_reset(struct smu_context 
> *smu)
>
>         ret = smu_cmn_send_debug_smc_msg(smu, DEBUGSMC_MSG_Mode1Reset);
>         if (!ret) {
> -               if (amdgpu_emu_mode == 1)
> +               if (amdgpu_emu_mode == 1) {
>                         msleep(50000);
> -               else
> +               } else {
> +                       /* disable mmio access while doing mode 1 reset*/
> +                       smu->adev->no_hw_access = true;
> +                       /* ensure no_hw_access is globally visible before any 
> MMIO */
> +                       smp_mb();
>                         msleep(1000);
> +               }
>         }
>
>         return ret;
> --
> 2.34.1
>

Reply via email to