On Wed, 5 Nov 2025 at 12:28, Shyam Sundar S K <[email protected]> wrote: > > Hi Ilpo, > > On 11/5/2025 16:43, Ilpo Järvinen wrote: > > On Mon, 27 Oct 2025, Antheas Kapenekakis wrote: > > > >> On Mon, 27 Oct 2025 at 09:36, Shyam Sundar S K <[email protected]> > >> wrote: > >>> > >>> > >>> > >>> On 10/27/2025 13:52, Shyam Sundar S K wrote: > >>>> > >>>> > >>>> On 10/24/2025 22:02, Mario Limonciello wrote: > >>>>> > >>>>> > >>>>> On 10/24/2025 11:08 AM, Antheas Kapenekakis wrote: > >>>>>> On Fri, 24 Oct 2025 at 17:43, Mario Limonciello > >>>>>> <[email protected]> wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> On 10/24/2025 10:21 AM, Antheas Kapenekakis wrote: > >>>>>>>> The ROG Xbox Ally (non-X) SoC features a similar architecture to the > >>>>>>>> Steam Deck. While the Steam Deck supports S3 (s2idle causes a crash), > >>>>>>>> this support was dropped by the Xbox Ally which only S0ix suspend. > >>>>>>>> > >>>>>>>> Since the handler is missing here, this causes the device to not > >>>>>>>> suspend > >>>>>>>> and the AMD GPU driver to crash while trying to resume afterwards > >>>>>>>> due to > >>>>>>>> a power hang. > >>>>>>>> > >>>>>>>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659 > >>>>>>>> Signed-off-by: Antheas Kapenekakis <[email protected]> > >>>>>>>> --- > >>>>>>>> drivers/platform/x86/amd/pmc/pmc.c | 3 +++ > >>>>>>>> drivers/platform/x86/amd/pmc/pmc.h | 1 + > >>>>>>>> 2 files changed, 4 insertions(+) > >>>>>>>> > >>>>>>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.c b/drivers/ > >>>>>>>> platform/x86/amd/pmc/pmc.c > >>>>>>>> index bd318fd02ccf..cae3fcafd4d7 100644 > >>>>>>>> --- a/drivers/platform/x86/amd/pmc/pmc.c > >>>>>>>> +++ b/drivers/platform/x86/amd/pmc/pmc.c > >>>>>>>> @@ -106,6 +106,7 @@ static void amd_pmc_get_ip_info(struct > >>>>>>>> amd_pmc_dev *dev) > >>>>>>>> switch (dev->cpu_id) { > >>>>>>>> case AMD_CPU_ID_PCO: > >>>>>>>> case AMD_CPU_ID_RN: > >>>>>>>> + case AMD_CPU_ID_VG: > >>>>>>>> case AMD_CPU_ID_YC: > >>>>>>>> case AMD_CPU_ID_CB: > >>>>>>>> dev->num_ips = 12; > >>>>>>>> @@ -517,6 +518,7 @@ static int amd_pmc_get_os_hint(struct > >>>>>>>> amd_pmc_dev *dev) > >>>>>>>> case AMD_CPU_ID_PCO: > >>>>>>>> return MSG_OS_HINT_PCO; > >>>>>>>> case AMD_CPU_ID_RN: > >>>>>>>> + case AMD_CPU_ID_VG: > >>>>>>>> case AMD_CPU_ID_YC: > >>>>>>>> case AMD_CPU_ID_CB: > >>>>>>>> case AMD_CPU_ID_PS: > >>>>>>>> @@ -717,6 +719,7 @@ static const struct pci_device_id > >>>>>>>> pmc_pci_ids[] = { > >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_RV) }, > >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_SP) }, > >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_SHP) }, > >>>>>>>> + { PCI_DEVICE(PCI_VENDOR_ID_AMD, AMD_CPU_ID_VG) }, > >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, > >>>>>>>> PCI_DEVICE_ID_AMD_1AH_M20H_ROOT) }, > >>>>>>>> { PCI_DEVICE(PCI_VENDOR_ID_AMD, > >>>>>>>> PCI_DEVICE_ID_AMD_1AH_M60H_ROOT) }, > >>>>>>>> { } > >>>>>>>> diff --git a/drivers/platform/x86/amd/pmc/pmc.h b/drivers/ > >>>>>>>> platform/x86/amd/pmc/pmc.h > >>>>>>>> index 62f3e51020fd..fe3f53eb5955 100644 > >>>>>>>> --- a/drivers/platform/x86/amd/pmc/pmc.h > >>>>>>>> +++ b/drivers/platform/x86/amd/pmc/pmc.h > >>>>>>>> @@ -156,6 +156,7 @@ void amd_mp2_stb_deinit(struct amd_pmc_dev *dev); > >>>>>>>> #define AMD_CPU_ID_RN 0x1630 > >>>>>>>> #define AMD_CPU_ID_PCO AMD_CPU_ID_RV > >>>>>>>> #define AMD_CPU_ID_CZN AMD_CPU_ID_RN > >>>>>>>> +#define AMD_CPU_ID_VG 0x1645 > >>>>>>> > >>>>>>> Can you see if 0xF14 gives you a reasonable value for the idle mask if > >>>>>>> you add it to amd_pmc_idlemask_read()? Make a new define for it > >>>>>>> though, > >>>>>>> it shouldn't use the same define as 0x1a platforms. > >>>>>> > >>>>>> It does not work. Reports 0. I also tested the other ones, but the > >>>>>> 0x1a was the same as you said. All report 0x0. > >>>>> > >>>>> It's possible the platform doesn't report an idle mask. > >>>>> > >>>>> 0xF14 is where I would have expected it to report. > >>>>> > >>>>> Shyam - can you look into this to see if it's in a different place > >>>>> than 0xF14 for Van Gogh? > >>>> > >>>> Van Gogh is before Cezzane? I am bit surprised that pmc is getting > >>>> loaded there. > >>>> > >>>> Antheas - what is the output of > >>>> > >>>> #lspci -s 00:00.0 > >>> > >>> OK. I get it from the diff. > >>> > >>> +#define AMD_CPU_ID_VG 0x1645 > >>> > >>> S0 its 0x1645 that indicates SoC is 17h family and 90h model. > >>> > >>> What is the PMFW version running on your system? > >>> amd_pmc_get_smu_version() tells you that information. > >> > >> cat /sys/devices/platform/AMDI0005:00/smu_fw_version > >> 63.18.0 > >> cat /sys/devices/platform/AMDI0005:00/smu_program > >> 7 > >> > >>> Can you see if you put the scratch information same as Cezzane and if > >>> that works? i.e. > >>> > >>> AMD_PMC_SCRATCH_REG_CZN(0x94) instead of AMD_PMC_SCRATCH_REG_1AH(0xF14) > >> > >> I tried all idle masks and they return 0 > > > > Hi Shyam & Antheas, > > > > This discussion seems to have died down without clear indication what's > > the best course of action here. Should I still wait? > > > > There's no particular hurry from my side but it seems Mario gave his > > Reviewed-by already and there hasn't been any follow-ups between you two, > > I'm left a bit unsure how to interpret that. > > > > The thought process to was understand how do we debug the rest 5% > failures when we do no not have idlemask concept, which got introduced > after sometime. But both the patches should work independently, so I > am ok with both patch 1/3 and 2/3. > > Acked-by: Shyam Sundar S K <[email protected]> > > > > > > In addition, is the patch 3/3 entire independent from these two PMC ones? > > (If yes, I don't know why they were submitted as a series as that just > > manages to add a little bit of uncertainty when combined into a series.) > > I see a note from Mario on the cover letter that the patch 3/3 can be > dropped from this series and a newer approach is being planned.
To be more specific, patch 3 became two separate patches that went through drm. For the rare failure, it would be an additional patch (if appropriate) that does not affect 1 and 2. Do you have any idea of where the failure for the other 5% of cases comes from? I noticed that after I hibernated my device and it booted up, it would never go into LPS0, the OS hint stopped working, would that be a hint? Antheas > So, 1/3 and 2/3 of this series can be taken. > > Thanks, > Shyam > > > > Thanks in advance, > > > > -- > > i. > > > >> Antheas > >> > >>> Thanks, > >>> Shyam > >>> > >>> > >>>> > >>>> 0xF14 index is meant for 1Ah (i.e. Strix and above) > >>>> > >>>>> > >>>>>> > >>>>>> Any idea why the OS hint only works 90% of the time? > >>>> > >>>> What is the output of amd_pmc_dump_registers() when 10% of the time > >>>> when the OS_HINT is not working? > >>>> > >>>> What I can surmise is, though pmc driver is sending the hint PMFW is > >>>> not taking any action (since the support in FW is missing) > >>>> > >>>>> > >>>>> If we get the idle mask reporting working we would have a better idea > >>>>> if that is what is reported wrong. > >>>>> > >>>> > >>>> IIRC, The concept of idlemask came only after cezzane that too after a > >>>> certain PMFW version. So I am not sure if idlemask actually exists. > >>>> > >>>> > >>>>> If I was to guess though; maybe GFX is still active. > >>>>> > >>>>> Depending upon what's going wrong smu_fw_info might have some more > >>>>> information too. > >>>> > >>>> That's a good point to try it out. > >>>> > >>>> Thanks, > >>>> Shyam > >>>> > >>> > >>> > >> > >> > > > >
