On 06-Jan-26 4:09 AM, Alex Deucher wrote:
On Fri, Dec 26, 2025 at 4:36 AM Perry Yuan <[email protected]> wrote:

During Mode 1 reset, the ASIC undergoes a reset cycle and becomes
temporarily inaccessible via PCIe. Any attempt to access MMIO registers
during this window (e.g., from interrupt handlers or other driver threads)
can result in uncompleted PCIe transactions, leading to NMI panics or
system hangs.

To prevent this, set the `no_hw_access` flag to true immediately after
triggering the reset. This signals other driver components to skip
register accesses while the device is offline.

A memory barrier `smp_mb()` is added to ensure the flag update is
globally visible to all cores before the driver enters the sleep/wait
state.

Seems like it would make sense to extend this to all asics which
support mode1 reset.


This doesn't look like a good idea, it's more like a shortcut. Ideally, there shouldn't be any access since suspend of all IP blocks get called during mode-1.

Actual fix should be at the place where driver accesses hardware (if any). That indicates some logic issues within driver.

Thanks,
Lijo

Alex


Signed-off-by: Perry Yuan <[email protected]>
Reviewed-by: Yifan Zhang <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c           | 3 +++
  drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 7 ++++++-
  drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c | 9 +++++++--
  3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 824c5489ec85..75b1b78c0437 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -5776,6 +5776,9 @@ int amdgpu_device_mode1_reset(struct amdgpu_device *adev)
         if (ret)
                 goto mode1_reset_failed;

+       /* enable mmio access after mode 1 reset completed */
+       adev->no_hw_access = false;
+
         amdgpu_device_load_pci_state(adev->pdev);
         ret = amdgpu_psp_wait_for_bootloader(adev);
         if (ret)
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
index 8e35d501e81d..dcb169b25916 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c
@@ -2850,8 +2850,13 @@ static int smu_v13_0_0_mode1_reset(struct smu_context 
*smu)
                 break;
         }

-       if (!ret)
+       if (!ret) {
+               /* disable mmio access while doing mode 1 reset*/
+               smu->adev->no_hw_access = true;
+               /* ensure no_hw_access is globally visible before any MMIO */
+               smp_mb();
                 msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS);
+       }

         return ret;
  }
diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c 
b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
index af1bc7b4350b..b1016debdf06 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
@@ -2069,10 +2069,15 @@ static int smu_v14_0_2_mode1_reset(struct smu_context 
*smu)

         ret = smu_cmn_send_debug_smc_msg(smu, DEBUGSMC_MSG_Mode1Reset);
         if (!ret) {
-               if (amdgpu_emu_mode == 1)
+               if (amdgpu_emu_mode == 1) {
                         msleep(50000);
-               else
+               } else {
+                       /* disable mmio access while doing mode 1 reset*/
+                       smu->adev->no_hw_access = true;
+                       /* ensure no_hw_access is globally visible before any 
MMIO */
+                       smp_mb();
                         msleep(1000);
+               }
         }

         return ret;
--
2.34.1


Reply via email to