[AMD Official Use Only - AMD Internal Distribution Only]

Patch3~10:

Reviewed-by: Feifei Xu <feifei...@amd.com>

-----Original Message-----
From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Lijo Lazar
Sent: Monday, September 2, 2024 3:34 PM
To: amd-gfx@lists.freedesktop.org
Cc: Zhang, Hawking <hawking.zh...@amd.com>; Deucher, Alexander 
<alexander.deuc...@amd.com>; Koenig, Christian <christian.koe...@amd.com>
Subject: [PATCH 00/10] Support XGMI reset on init

There are case where a device needs to be reset first before it is fully 
initialized. An example case is a driver reinstallation with a different 
version of PSP TOS. In such a case, if a device supports reset in which PSP TOS 
is unloaded, then driver needs to reset device first and then load the new 
firmware components.

For devices in an XGMI hive, a reset needs to be sent on all devices in the 
hive. Thus driver should discover first devices that belong to a hive with PSP 
support.

There is an existing delayed reset handler, however it has the below
limitations-
1) It doesn't discover devices in the hive, instead it tries to do XGMI reset 
for all devices registered to mgpu struct. mgpu struct may have other devices 
than the one which belong to a hive. Also, if there is more than one hive, it 
doesn't work.
2) It doesn't take a reset lock and since this is a delayed reset, that could 
result in unwanted hardware accesses during a reset.
3) It doesn't initialize RAS properly (left as TODO)

This series overcomes the above limitations. Instead of marking a pending 
reset, init levels are defined where the level of initialization may be 
defined. In case of a pending reset, only specific hardware blocks may be 
initialized.

Further work (not done in this series) may be done to have fine grain controls 
for init levels - say skip enabling features like DPM enablement, or skip 
loading specific set of fimwares as they won't be required during a minimal 
init scenario where device is going to be reset.

The series adds an API interface to check if a PSP TOS reload is required.


Lijo Lazar (10):
  drm/amdgpu: Add init levels
  drm/amdgpu: Use init level for pending_reset flag
  drm/amdgpu: Separate reinitialization after reset
  drm/amdgpu: Add reset on init handler for XGMI
  drm/amdgpu: Add helper to initialize badpage info
  drm/amdgpu: Refactor XGMI reset on init handling
  drm/amdgpu: Drop delayed reset work handler
  drm/amdgpu: Support reset-on-init on select SOCs
  drm/amdgpu: Add interface for TOS reload cases
  drm/amdgpu: Add PSP reload case to reset-on-init

 drivers/gpu/drm/amd/amdgpu/aldebaran.c        |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  21 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 245 +++++++++++-------
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       |  81 ------
 drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h       |   1 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c       |  13 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h       |   3 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c       |  62 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h       |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c     | 148 +++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h     |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c      |  72 ++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h      |   2 +
 drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |  14 +-
 drivers/gpu/drm/amd/amdgpu/psp_v13_0.c        |  25 ++
 drivers/gpu/drm/amd/amdgpu/soc15.c            |   7 +
 .../gpu/drm/amd/pm/swsmu/smu11/smu_v11_0.c    |   3 +-
 17 files changed, 492 insertions(+), 214 deletions(-)

--
2.25.1

Reply via email to