On 16/02/2026 10:11, Thorsten Leemhuis wrote:
> [top-posting to make sure everybody is aware that involving Linus for a
> judgement call is on the table]
> 
> Geert: Many thx for the clarification. In that case:
> 
> Geert or Marek: could one of you two please submit a revert for
> 330e76d3169721 ("drm/imagination: Add power domain control") [v6.16-rc1]?
> 
> With a bit of luck the responsible drm maintainers will pick it up
> quickly -- or it might lead to a different solution for the problem. If
> neither happens within a few days, I'll point Linus to this thread and
> the revert. Then he can make the judgement call if he wants to.

Hi all,

Apologies for not replying to this sooner, but please see some thoughts
below on various parts of this discussion.

We're currently trying to force this issue to reproduce on hardware we
have on hand; we'd like to see it fixed properly as much as anyone.

> 
> Ciao, Thorsten
> 
> On 2/16/26 10:00, Geert Uytterhoeven wrote:
>> On Sat, 14 Feb 2026 at 13:38, Thorsten Leemhuis
>> <[email protected]> wrote:
>>> On 2/13/26 23:52, Marek Vasut wrote:
>>>> On 2/12/26 4:56 PM, Thorsten Leemhuis wrote:
>>>>> On 2/12/26 15:38, Marek Vasut wrote:
>>>>>> On 2/12/26 10:00 AM, Matt Coster wrote:
>>>>>>> On 11/02/2026 19:17, Marek Vasut wrote:
>>>>>>>> On 1/23/26 2:50 PM, Geert Uytterhoeven wrote:
>>>>>>>>> On Fri, 23 Jan 2026 at 14:36, Matt Coster <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>> On 22/01/2026 16:08, Geert Uytterhoeven wrote:
>>>>>>>>>>> Call the dev_pm_domain_attach_list() and
>>>>>>>>>>> dev_pm_domain_detach_list()
>>>>>>>>>>> helpers instead of open-coding multi PM Domain handling.
>>>>>>>>>>>
>>>>>>>>>>> This changes behavior slightly:
>>>>>>>>>>>      - The new handling is also applied in case of a single PM
>>>>>>>>>>> Domain,
>>>>>>>>>>>      - PM Domains are now referred to by index instead of by
>>>>>>>>>>> name, but
>>>>>>>>>>>        "make dtbs_check" enforces the actual naming and ordering
>>>>>>>>>>> anyway,
>>>>>>>>>>>      - There are no longer device links created between virtual
>>>>>>>>>>> domain
>>>>>>>>>>>        devices, only between virtual devices and the parent device.
>>>>>>>>>>
>>>>>>>>>> We still need this guarantee, both at start and end of day. In the
>>>>>>>>>> current implementation dev_pm_domain_attach_list() iterates
>>>>>>>>>> forwards,
>>>>>>>>>> but so does dev_pm_domain_detach_list(). Even if we changed that,
>>>>>>>>>> I'd
>>>>>>>>>> prefer not to rely on the implementation details when we can
>>>>>>>>>> declare the
>>>>>>>>>> dependencies explicitly.
>>>>>>>>>
>>>>>>>>> Note that on R-Car, the PM Domains are nested (see e.g.
>>>>>>>>> r8a7795_areas[]),
>>>>>>>>> so they are always (un)powered in the correct order.  But that may
>>>>>>>>> not
>>>>>>>>> be the case in the integration on other SoCs.
>>>>>>>>>
>>>>>>>>>> We had/have a patch (attached) kicking around internally to use the
>>>>>>>>>> *_list() functions but keep the inter-domain links in place; it got
>>>>>>>>>> held
>>>>>>>>>> up by discussions as to whether we actually need those dependencies
>>>>>>>>>> for
>>>>>>>>>> the hardware to behave correctly. Your patch spurred me to run
>>>>>>>>>> around
>>>>>>>>>> the office and nag people a bit, and it seems we really do need to
>>>>>>>>>> care
>>>>>>>>>> about the ordering.
>>>>>>>>>
>>>>>>>>> OK.
>>>>>>>>>
>>>>>>>>>> Can you add the links back in for a V2 or I can properly send the
>>>>>>>>>> attached patch instead, I don't mind either way.
>>>>>>>>>
>>>>>>>>> Please move forward with your patch, you are the expert.
>>>>>>>>> I prefer not to be blamed for any breakage ;-)
>>>>>>>>
>>>>>>>> Has there been any progress on fixing this kernel crash ?
>>>>>>>>
>>>>>>>> There are already two proposed solutions, but no fix is upstream.
>>>>>>>
>>>>>>> Yes and no. Our patch to use dev_pm_domain_attach_list() has landed in
>>>>>>> drm-misc-next as commit e19cc5ab347e3 ("drm/imagination: Use>>
>>>>>>> dev_pm_domain_attach_list()"), but this does not fix the underlying
>>>>>>> issue of missing synchronization in the PM core[1] is still unresolved
>>>>>>> as far as I'm aware.
>>>>>>
>>>>>> OK, but the pvr driver can currently easily crash the kernel on boot if
>>>>>> firmware is missing, so that should be fixed soon, right ?
>>>>>
>>>>> Well, drm-misc-next afaik means that the above mentioned fix would only
>>>>> be merged in 7.1, which is ~4 months away, which is not really "soon"
>>>>> I'd say. Or did I misjudge this?

The above isn't really a "fix" per se, it's just an enhancement. The
underlying crash can still happen. We could still pick it into
drm-misc-fixes and have it in the next -rc plus backported to stable,
but I'm not sure I see the value.

>>>>
>>>> The PM domain issue here crashes the kernel, so I think this would be
>>>> material for drm-misc-fixes .
>>>
>>> Yeah, sounds a lot like it.
>>>
>>>>>> I added the regressions list onto CC, because this seems like a problem
>>>>>> worth tracking.
>>>>>
>>>>> Noticed that and wondered what change caused the regression.

From our side at least, I don't believe this is a regression at all. We
haven't been able to reproduce this issue on any of the platforms we
have available (although we did stumble on a somewhat related bugfix
while trying).

>>>>
>>>> I think this one:
>>>>
>>>> 330e76d31697 ("drm/imagination: Add power domain control")

This commit added support for multiple power domains on Imagination GPUs
in essentially the same manner as other drivers (as mentioned by Geert
below). Nothing in there is specific to the Renesas platforms where this
bug can be produces and it is required to support other fully-functional
platforms such as TI AM68.

>>>
>>> Thx; FWIW, that was merged for v6.16-rc1.
>>>
>>>>> Did not
>>>>> find a answer in a quick search on lore[1]. Because if it's a
>>>>> regression, we maybe should just revert the culprit for now according to
>>>>> Linus:
>>>>> https://lore.kernel.org/lkml/CAHk-=wi86AosXs66- 
>>>>> [email protected]/
>>>>>
>>>>> [1] I guess this was the initial report from Geert?
>>>>> https://lore.kernel.org/all/ 
>>>>> camuhmdwapt40hv3c+csbqfow05awcv1a6v_nijygoyi0i9_...@mail.gmail.com/
>>>>
>>>> It is.
>>>>
>>>> I think there are other SoCs which depend on the power domain commit, so
>>>> revert is not so clear cut anymore.
>>>
>>> Well, it's a judgement call. 330e76d31697 was merged less then a year
>>> ago, so I'd not be surprised at all if Linus would revert it in a case
>>> like this. But it seems it doesn't revert clearly anymore, which
>>> complicates things.

Reverting this change feels backwards, we're saying that Imagination's
support for multiple power domains is broken because one platform has
issues when we attempt to control its power domains. I fully agree that
we should be working towards resolving this issue, but I don't agree
that ripping out this handling (that, again, is essentially the same as
handling in other GPU drivers) is a reasonable solution (even in the
short term).

As it currently stands, we're only declaring two GPUs as "fully
supported": AXE-1-16M and BXS-4-64 as made explicit in commit
1c21f240fbc1e ("drm/imagination: Warn or error on unsupported
hardware"). In order to reach the stage where that check occurs, the GPU
must be powered on so registers containing the GPU ID can be read.
However, the original error report happens long after this and would now
require the exp_hw_support module parameter to be set for this point to
be reached (as the driver would not attempt to initialise an
"unsupported" device without this explicit opt-in).

It's not currently clear to me whether the crash can be reproduced on
the affected Renesas platform without this opt-in; it's feasable that
this is the case but we know it occurs during an error unwind after
device firmware fails to initialise and the exp_hw_support check happens
much earlier than this.

Either way, we should consider adding additional "supported" checks
based on compatibile strings earlier during initialization to prevent
fundamental issues like this one from even being possible on unsupported
and/or unvalidated hardware.

All that being said, ripping out power domain handling will completely
break one of the only two platforms we currently fully support: BXS-4-64
which also has two power domains and is present in several TI SoCs (and
their DTs).

>>>
>>>> But SoCs which have hierarchical
>>>> power domains and which manage to probe this driver without having a
>>>> firmware available for the GPU will simply end with crashed kernel,
>>>> which is really not good.
>>>
>>> Does the patch Matt mentioned fix the crash? His "this does not fix the
>>> underlying issue [...]" (see quote earlier) makes it sound like the
>>> crash or some other problem (theoretical or practical? regression or
>>> not?) remains. If that's the case and no quick fix in sight I guess it
>>> would be best if someone affected could post a revert and then we can
>>> ask Linus if he wants to pick it up.

The patch I posted (and applied) is somewhat orthogonal to the issue at
hand. Geert originally suggested using the _attach_list() helper as a
way to mitigate the issue, and we agree that it's a reasonable cleanup
effort to make use of it, but I believe the "fix" in the originally
proposed patch came from removing the device links that are required on
our platform to ensure the GPU power domains come up (and down) in the
correct sequence.

>>
>> I don't think that patch would fix the crash.  The Adreno and Panfrost
>> GPU drivers do similar things (explicit multi-PM Domain handling),
>> so I am wondering if the issue can be triggered with them too (e.g. on
>> unbind).

My current understanding of the situation is that the fix proposed by
Marek in the Reneasas driver[2] works, but is not suitable since
pm_runtime_barrier() should be inserted by the caller, not the power
driver. But it seems that's not always possible (particularly when using
devm), so I don't really understand where we go from here. I still don't
see anything we're doing substantially differently (before or after the
commit I mentioned above) from anybody else.

Cheers,
Matt

[2]: https://lore.kernel.org/r/[email protected]/

-- 
Matt Coster
E: [email protected]

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to