On Wed, Jun 18, 2025 at 9:41 AM Mario Limonciello <supe...@kernel.org> wrote: > > On 6/18/2025 4:05 AM, Christian König wrote: > > On 6/18/25 10:51, Peter Zijlstra wrote: > >> On Tue, Jun 17, 2025 at 09:12:12PM -0500, Mario Limonciello wrote: > >> > >>> How about if we reset before the kexec? There is a symbol for drivers to > >>> use to know they're about to go through kexec to do $THINGS. > >>> > >>> Something like this: > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> index 0fc0eeedc6461..2b1216b14d618 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > >>> @@ -34,6 +34,7 @@ > >>> > >>> #include <linux/cc_platform.h> > >>> #include <linux/dynamic_debug.h> > >>> +#include <linux/kexec.h> > >>> #include <linux/module.h> > >>> #include <linux/mmu_notifier.h> > >>> #include <linux/pm_runtime.h> > >>> @@ -2544,6 +2545,9 @@ amdgpu_pci_shutdown(struct pci_dev *pdev) > >>> adev->mp1_state = PP_MP1_STATE_UNLOAD; > >>> amdgpu_device_ip_suspend(adev); > >>> adev->mp1_state = PP_MP1_STATE_NONE; > >>> + > >>> + if (kexec_in_progress) > >>> + amdgpu_asic_reset(adev); > >>> } > >>> > >>> static int amdgpu_pmops_prepare(struct device *dev) > >> > >> I will throw this in the dev kernel... I'll let you know. > > > > Mhm if the drivers are informed about the kexec > > It looks like PeterZ found the symbol isn't exported; but that's not to > say it "can't be" if it fixes this issue. > > > then we could also send the unload/reset packet only to the PSP IIRC. > > > > That might have a better chance of succeeding than a full ASIC reset. > > > > Lijo should know more about that. > > > > Regards, > > Christian. > > Another idea is to do a FLR on the way down.
I think you want something like: r = amdgpu_dpm_set_mp1_state(adev, PP_MP1_STATE_UNLOAD); Alex