On Fri, Dec 26, 2025 at 4:36 AM Perry Yuan <[email protected]> wrote: > > During Mode 1 reset, the ASIC undergoes a reset cycle and becomes > temporarily inaccessible via PCIe. Any attempt to access MMIO registers > during this window (e.g., from interrupt handlers or other driver threads) > can result in uncompleted PCIe transactions, leading to NMI panics or > system hangs. > > To prevent this, set the `no_hw_access` flag to true immediately after > triggering the reset. This signals other driver components to skip > register accesses while the device is offline. > > A memory barrier `smp_mb()` is added to ensure the flag update is > globally visible to all cores before the driver enters the sleep/wait > state.
Seems like it would make sense to extend this to all asics which support mode1 reset. Alex > > Signed-off-by: Perry Yuan <[email protected]> > Reviewed-by: Yifan Zhang <[email protected]> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ > drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c | 7 ++++++- > drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c | 9 +++++++-- > 3 files changed, 16 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 824c5489ec85..75b1b78c0437 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -5776,6 +5776,9 @@ int amdgpu_device_mode1_reset(struct amdgpu_device > *adev) > if (ret) > goto mode1_reset_failed; > > + /* enable mmio access after mode 1 reset completed */ > + adev->no_hw_access = false; > + > amdgpu_device_load_pci_state(adev->pdev); > ret = amdgpu_psp_wait_for_bootloader(adev); > if (ret) > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c > index 8e35d501e81d..dcb169b25916 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_0_ppt.c > @@ -2850,8 +2850,13 @@ static int smu_v13_0_0_mode1_reset(struct smu_context > *smu) > break; > } > > - if (!ret) > + if (!ret) { > + /* disable mmio access while doing mode 1 reset*/ > + smu->adev->no_hw_access = true; > + /* ensure no_hw_access is globally visible before any MMIO */ > + smp_mb(); > msleep(SMU13_MODE1_RESET_WAIT_TIME_IN_MS); > + } > > return ret; > } > diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c > b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c > index af1bc7b4350b..b1016debdf06 100644 > --- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c > +++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c > @@ -2069,10 +2069,15 @@ static int smu_v14_0_2_mode1_reset(struct smu_context > *smu) > > ret = smu_cmn_send_debug_smc_msg(smu, DEBUGSMC_MSG_Mode1Reset); > if (!ret) { > - if (amdgpu_emu_mode == 1) > + if (amdgpu_emu_mode == 1) { > msleep(50000); > - else > + } else { > + /* disable mmio access while doing mode 1 reset*/ > + smu->adev->no_hw_access = true; > + /* ensure no_hw_access is globally visible before any > MMIO */ > + smp_mb(); > msleep(1000); > + } > } > > return ret; > -- > 2.34.1 >
