On 10/20/25 11:32 PM, Mario Limonciello (AMD) (kernel.org) wrote: > > > On 10/20/2025 12:39 PM, Rafael J. Wysocki wrote: >> On Mon, Oct 20, 2025 at 7:28 PM Mario Limonciello (AMD) (kernel.org) >> <[email protected]> wrote: >>> >>> >>> >>> On 10/20/2025 12:21 PM, Rafael J. Wysocki wrote: >>>> On Mon, Oct 20, 2025 at 6:53 PM Mario Limonciello (AMD) >>>> <[email protected]> wrote: >>>>> >>>>> From: Mario Limonciello <[email protected]> >>>>> >>>>> The PM core should be notified that thaw was skipped for the device >>>>> so that if it's tried to be resumed (such as an aborted hibernate) >>>>> that it gets another chance to resume. >>>>> >>>>> Cc: Muhammad Usama Anjum <[email protected]> >>>>> Signed-off-by: Mario Limonciello <[email protected]> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- >>>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> index 61268aa82df4d..d40af069f24dd 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c >>>>> @@ -2681,7 +2681,7 @@ static int amdgpu_pmops_thaw(struct device *dev) >>>>> >>>>> /* do not resume device if it's normal hibernation */ >>>>> if (!pm_hibernate_is_recovering() && >>>>> !pm_hibernation_mode_is_suspend()) >>>>> - return 0; >>>>> + return -EBUSY; >>>> >>>> So that's why you need the special handling of -EBUSY in the previous >>>> patch. >>> >>> Yup. >>> >>>> >>>> I think that you need to save some state in this driver and then use >>>> it in subsequent callbacks instead of hacking the core to do what you >>>> want. >>>> >>> >>> The problem is the core decides "what" to call and more importantly >>> "when" to call it. >>> >>> IE if the core thinks that something is thawed it will never call >>> resume, and that's why you end up in a bad place with Muhammad's >>> cancellation series and why I proposed this one to discuss. >>> >>> We could obviously go back to dropping this case entirely: >>> >>> if (!pm_hibernate_is_recovering() && !pm_hibernation_mode_is_suspend()) >>> >>> But then the display turns on at thaw(), you do an unnecessary resource >>> eviction, it takes a lot longer if you have a ton of VRAM etc. >> >> The cancellation series is at odds with this code path AFAICS because >> what if hibernation is canceled after the entire thaw transition? > > Muhammad - did you test that specific timing of cancelling the hibernate? Yes, I've tested the cancellations before and after the thaw both.
>> >> Some cleanup would need to be done before thawing user space I suppose. > > I agree; I think that series would need changes for it. > > But if you put that series aside, I think this one still has some merit on > it's own. If another driver aborted the hibernate, I think the same thing > could happen if it happened to run before amdgpu's device thaw(). > > That series just exposed a very "easy" way to reproduce this issue. -- --- Thanks, Usama
