Hi Ilpo, On 11/5/2025 16:43, Ilpo Järvinen wrote: > On Mon, 27 Oct 2025, Antheas Kapenekakis wrote: > >> On Mon, 27 Oct 2025 at 09:36, Shyam Sundar S K <[email protected]> >> wrote: >>> >>> >>> >>> On 10/27/2025 13:52, Shyam Sundar S K wrote: >>>> >>>> >>>> On 10/24/2025 22:02, Mario Limonciello wrote: >>>>> >>>>> >>>>> On 10/24/2025 11:08 AM, Antheas Kapenekakis wrote: >>>>>> On Fri, 24 Oct 2025 at 17:43, Mario Limonciello >>>>>> <[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 10/24/2025 10:21 AM, Antheas Kapenekakis wrote: >>>>>>>> The ROG Xbox Ally (non-X) SoC features a similar architecture to the >>>>>>>> Steam Deck. While the Steam Deck supports S3 (s2idle causes a crash), >>>>>>>> this support was dropped by the Xbox Ally which only S0ix suspend. >>>>>>>> >>>>>>>> Since the handler is missing here, this causes the device to not >>>>>>>> suspend >>>>>>>> and the AMD GPU driver to crash while trying to resume afterwards >>>>>>>> due to >>>>>>>> a power hang. >>>>>>>> >>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659 >>>>>>>> Signed-off-by: Antheas Kapenekakis <[email protected]> >>>>>>>> --- >>>>>>>> drivers/platform/x86/amd/pmc/pmc.c | 3 +++ >>>>>>>> drivers/platform/x86/amd/pmc/pmc.h | 1 + >>>>>>>> 2 files changed, 4 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/ >>>>>>>> platform/x86/amd/pmc/pmc.c >>>>>>>> index bd318fd02ccf..cae3fcafd4d7 100644 >>>>>>>> --- a/drivers/platform/x86/amd/pmc/pmc.c >>>>>>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c >>>>>>>> @@ -106,6 +106,7 @@ static void amd_pmc_get_ip_info(struct >>>>>>>> amd_pmc_dev *dev) >>>>>>>> switch (dev->cpu_id) { >>>>>>>> case AMD_CPU_ID_PCO: >>>>>>>> case AMD_CPU_ID_RN: >>>>>>>> + case AMD_CPU_ID_VG: >>>>>>>> case AMD_CPU_ID_YC: >>>>>>>> case AMD_CPU_ID_CB: >>>>>>>> dev->num_ips = 12; >>>>>>>> @@ -517,6 +518,7 @@ static int amd_pmc_get_os_hint(struct >>>>>>>> amd_pmc_dev *dev) >>>>>>>> case AMD_CPU_ID_PCO: >>>>>>>> return MSG_OS_HINT_PCO; >>>>>>>> case AMD_CPU_ID_RN: >>>>>>>> + case AMD_CPU_ID_VG: >>>>>>>> case AMD_CPU_ID_YC: >>>>>>>> case AMD_CPU_ID_CB: >>>>>>>> case AMD_CPU_ID_PS: >>>>>>>> @@ -717,6 +719,7 @@ static const struct pci_device_id >>>>>>>> pmc_pci_ids[] = { >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_RV) }, >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_SP) }, >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_SHP) }, >>>>>>>> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_VG) }, >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, >>>>>>>> PCI_DEVICE_ID_AMD_1AH_M20H_ROOT) }, >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, >>>>>>>> PCI_DEVICE_ID_AMD_1AH_M60H_ROOT) }, >>>>>>>> { } >>>>>>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.h b/drivers/ >>>>>>>> platform/x86/amd/pmc/pmc.h >>>>>>>> index 62f3e51020fd..fe3f53eb5955 100644 >>>>>>>> --- a/drivers/platform/x86/amd/pmc/pmc.h >>>>>>>> +++ b/drivers/platform/x86/amd/pmc/pmc.h >>>>>>>> @@ -156,6 +156,7 @@ void amd_mp2_stb_deinit(struct amd_pmc_dev *dev); >>>>>>>> #define AMD_CPU_ID_RN 0x1630 >>>>>>>> #define AMD_CPU_ID_PCO AMD_CPU_ID_RV >>>>>>>> #define AMD_CPU_ID_CZN AMD_CPU_ID_RN >>>>>>>> +#define AMD_CPU_ID_VG 0x1645 >>>>>>> >>>>>>> Can you see if 0xF14 gives you a reasonable value for the idle mask if >>>>>>> you add it to amd_pmc_idlemask_read()? Make a new define for it >>>>>>> though, >>>>>>> it shouldn't use the same define as 0x1a platforms. >>>>>> >>>>>> It does not work. Reports 0. I also tested the other ones, but the >>>>>> 0x1a was the same as you said. All report 0x0. >>>>> >>>>> It's possible the platform doesn't report an idle mask. >>>>> >>>>> 0xF14 is where I would have expected it to report. >>>>> >>>>> Shyam - can you look into this to see if it's in a different place >>>>> than 0xF14 for Van Gogh? >>>> >>>> Van Gogh is before Cezzane? I am bit surprised that pmc is getting >>>> loaded there. >>>> >>>> Antheas - what is the output of >>>> >>>> #lspci -s 00:00.0 >>> >>> OK. I get it from the diff. >>> >>> +#define AMD_CPU_ID_VG 0x1645 >>> >>> S0 its 0x1645 that indicates SoC is 17h family and 90h model. >>> >>> What is the PMFW version running on your system? >>> amd_pmc_get_smu_version() tells you that information. >> >> cat /sys/devices/platform/AMDI0005:00/smu_fw_version >> 63.18.0 >> cat /sys/devices/platform/AMDI0005:00/smu_program >> 7 >> >>> Can you see if you put the scratch information same as Cezzane and if >>> that works? i.e. >>> >>> AMD_PMC_SCRATCH_REG_CZN(0x94) instead of AMD_PMC_SCRATCH_REG_1AH(0xF14) >> >> I tried all idle masks and they return 0 > > Hi Shyam & Antheas, > > This discussion seems to have died down without clear indication what's > the best course of action here. Should I still wait? > > There's no particular hurry from my side but it seems Mario gave his > Reviewed-by already and there hasn't been any follow-ups between you two, > I'm left a bit unsure how to interpret that. >
The thought process to was understand how do we debug the rest 5% failures when we do no not have idlemask concept, which got introduced after sometime. But both the patches should work independently, so I am ok with both patch 1/3 and 2/3. Acked-by: Shyam Sundar S K <[email protected]> > > In addition, is the patch 3/3 entire independent from these two PMC ones? > (If yes, I don't know why they were submitted as a series as that just > manages to add a little bit of uncertainty when combined into a series.) I see a note from Mario on the cover letter that the patch 3/3 can be dropped from this series and a newer approach is being planned. So, 1/3 and 2/3 of this series can be taken. Thanks, Shyam > > Thanks in advance, > > -- > i. > >> Antheas >> >>> Thanks, >>> Shyam >>> >>> >>>> >>>> 0xF14 index is meant for 1Ah (i.e. Strix and above) >>>> >>>>> >>>>>> >>>>>> Any idea why the OS hint only works 90% of the time? >>>> >>>> What is the output of amd_pmc_dump_registers() when 10% of the time >>>> when the OS_HINT is not working? >>>> >>>> What I can surmise is, though pmc driver is sending the hint PMFW is >>>> not taking any action (since the support in FW is missing) >>>> >>>>> >>>>> If we get the idle mask reporting working we would have a better idea >>>>> if that is what is reported wrong. >>>>> >>>> >>>> IIRC, The concept of idlemask came only after cezzane that too after a >>>> certain PMFW version. So I am not sure if idlemask actually exists. >>>> >>>> >>>>> If I was to guess though; maybe GFX is still active. >>>>> >>>>> Depending upon what's going wrong smu_fw_info might have some more >>>>> information too. >>>> >>>> That's a good point to try it out. >>>> >>>> Thanks, >>>> Shyam >>>> >>> >>> >> >> >
