On 2/16/26 2:37 PM, Matt Coster wrote:
On 16/02/2026 11:38, Thorsten Leemhuis wrote:
On 2/16/26 11:58, Matt Coster wrote:
On 16/02/2026 10:11, Thorsten Leemhuis wrote:

We're currently trying to force this issue to reproduce on hardware we
have on hand; we'd like to see it fixed properly as much as anyone.

Yeah, no worries, I never doubted that. But getting things properly fixed
can mean "revert, fix, reapply" when it comes to regressions in Linux --
which is something that should not be seen as something bad, as Linus said
himself (see below)!

 From our side at least, I don't believe this is a regression at all.
In the end what matters is: some change afaics caused systems to not work
anymore that used to be working -- that makes it a regression my the Linux
kernels standards. And those by the same standards must be fixed, ideally
quickly. Find a few quotes on that from Linus below that explains this
better.

I feel like I should reiterate that the commit we're talking about
reverting is fundamental to support for one of the only two platforms
currently supported. And that the changes to add "support" (just
bindings and DT) for the affected Renesas platforms came several months
*after* this.

I would argue, that the problem at hand is not related to any specific platform, this is a driver bug. That some platform triggers it means, that the driver bug is real and has to be fixed. Whether the bug is in this driver or PM core.

The "regression" here is that we allowed DTS changes to land for
unsupported platforms in the interest of allowing further development to
happen incrementally upstream. There has been no further progress on
that front beyond the DTS patches, however.

Those specific DTS patches were put on hold, they couldn't be applied because they would lead to kernel crash in this driver, so the hold is to be expected.

We have never declared that
these platforms should be functional and error-free, and have taken
measures to ensure this is clear to users[1].

I would argue, we should not mix functional issues with outright kernel crashes. If the GPU misrenders something, that is a functional issue. If the GPU driver crashes the kernel, that is a kernel bug and should be fixed.

And in this case, it is the later, the driver can trigger a kernel crash.

There are currently two platforms on which this has been reproduced:

  - Renesas Gray Hawk Single (R-Car V4M) -- this was the original report
    from Geert, and it should be noted that there are no bindings or DTS
    support for the GPU in this platform in tree at this time.
  - Renesas Salvator-X (R-Car M3-W) -- this was Geert's follow-up
    reproduction case, and the upstream bindings and DTS do contain the
    GPU, but it required adding delays to PM core code to trigger the
    race condition(?) that causes the crash.

As far as we know, there are no other situations where this crash
occurs.

It seems the crash would occur on any platform with hierarchical power domains.

Would you consider a suitable "revert" to be fully gating support for
these platforms (or even the entire group of Renesas platforms added in
this "experimental" manner just to be safe) behind the exp_hw_support
paramater until they can be properly tested? Specifically, I'm talking
about masking them off at the of_match level so that no hardware
interaction is even attempted without explicit user opt-in to
experimental hardware.

No, that is only hiding the kernel crash without actually fixing it. This is not good.

Reply via email to