When an eGPU is unplugged the KFD topology should also be destroyed for that GPU. This never happens because the fini_sw callbacks never get to run. Run them manually at the end of amdgpu_device_fini_hw() when a device is already disconnected.
Closes: https://community.frame.work/t/amd-egpu-on-linux/8691/33 Signed-off-by: Mario Limonciello (AMD) <[email protected]> --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 021ecc988ff79..4bac0d25547f2 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -5263,6 +5263,9 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) if (drm_dev_is_unplugged(adev_to_drm(adev))) amdgpu_device_unmap_mmio(adev); + /* surprise hotplug */ + if (pci_dev_is_disconnected(adev->pdev)) + amdgpu_amdkfd_device_fini_sw(adev); } void amdgpu_device_fini_sw(struct amdgpu_device *adev) -- 2.43.0
