On 16/02/2026 11:38, Thorsten Leemhuis wrote: > On 2/16/26 11:58, Matt Coster wrote: >> On 16/02/2026 10:11, Thorsten Leemhuis wrote: >> >> We're currently trying to force this issue to reproduce on hardware we >> have on hand; we'd like to see it fixed properly as much as anyone. > > Yeah, no worries, I never doubted that. But getting things properly fixed > can mean "revert, fix, reapply" when it comes to regressions in Linux -- > which is something that should not be seen as something bad, as Linus said > himself (see below)! > >> From our side at least, I don't believe this is a regression at all. > In the end what matters is: some change afaics caused systems to not work > anymore that used to be working -- that makes it a regression my the Linux > kernels standards. And those by the same standards must be fixed, ideally > quickly. Find a few quotes on that from Linus below that explains this > better.
I feel like I should reiterate that the commit we're talking about
reverting is fundamental to support for one of the only two platforms
currently supported. And that the changes to add "support" (just
bindings and DT) for the affected Renesas platforms came several months
*after* this.
The "regression" here is that we allowed DTS changes to land for
unsupported platforms in the interest of allowing further development to
happen incrementally upstream. There has been no further progress on
that front beyond the DTS patches, however. We have never declared that
these platforms should be functional and error-free, and have taken
measures to ensure this is clear to users[1].
There are currently two platforms on which this has been reproduced:
- Renesas Gray Hawk Single (R-Car V4M) -- this was the original report
from Geert, and it should be noted that there are no bindings or DTS
support for the GPU in this platform in tree at this time.
- Renesas Salvator-X (R-Car M3-W) -- this was Geert's follow-up
reproduction case, and the upstream bindings and DTS do contain the
GPU, but it required adding delays to PM core code to trigger the
race condition(?) that causes the crash.
As far as we know, there are no other situations where this crash
occurs.
Would you consider a suitable "revert" to be fully gating support for
these platforms (or even the entire group of Renesas platforms added in
this "experimental" manner just to be safe) behind the exp_hw_support
paramater until they can be properly tested? Specifically, I'm talking
about masking them off at the of_match level so that no hardware
interaction is even attempted without explicit user opt-in to
experimental hardware.
Cheers,
Matt
[1]: commit 1c21f240fbc1 ("drm/imagination: Warn or error on unsupported
hardware")
>
> Ciao, Thorsten
> ---
>
>
> On how quickly regressions should be fixed
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> * From `2026-01-22
> <https://lore.kernel.org/all/CAHk-=wheqniw_wthgo7bkkt7uib-p+ai2jp9m+z+fycz6ca...@mail.gmail.com/
> >`_::
>
> But a user complaining should basically result in an immediate fix -
> possibly a "revert and rethink".
>
> With a later clarification on `2026-01-28
> <https://lore.kernel.org/all/cahk-%3dwi86aosxs66-yi54%2bmpqjpu0upxb8zafg%[email protected]/
> >`_::
>
> It's also worth noting that "immediate" obviously doesn't mean "right
> this *second* when the problem has been reported".
>
> But if it's a regression with a known commit that caused it, I think
> the rule of thumb should generally be "within a week", preferably
> before the next rc.
>
> * From `2023-04-21
> <https://lore.kernel.org/all/CAHk-=wgD98pmSK3ZyHk_d9kZ2bhgN6DuNZMAJaV0WTtbkf=r...@mail.gmail.com/
> >`_::
>
> Known-broken commits either
> (a) get a timely fix that doesn't have other questions
> or
> (b) get reverted
>
> * From `2021-09-20(2)
> <https://lore.kernel.org/all/CAHk-=wgovmtrw1tnbmc1rn5yqytkyn0hz+sc4k0dgnn++u9...@mail.gmail.com/
> >`_::
>
> [...] review shouldn't hold up reported regressions of existing code.
> That's
> just basic _testing_ - either the fix should be applied, or - if the fix
> is
> too invasive or too ugly - the problematic source of the regression should
> be reverted.
>
> Review should be about new code, it shouldn't be holding up "there's a
> bug report, here's the obvious fix".
>
> * From `2023-05-08
> <https://lore.kernel.org/all/CAHk-=wgzU8_dGn0Yg+DyX7ammTkDUCyEJ4C=nvnhrhxkwc7...@mail.gmail.com/
> >`_::
>
> If something doesn't even build, it should damn well be fixed ASAP.
>
>
> On how fixing regressions with reverts can help prevent maintainer burnout
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> * From `2026-01-28
> <https://lore.kernel.org/all/cahk-%3dwi86aosxs66-yi54%2bmpqjpu0upxb8zafg%[email protected]/
> >`_::
>
> > So how can I/we make "immediate fixes" happen more often without
> > contributing to maintainer burnout?
>
> [...] the "revert and rethink" model [...] often a good idea in general
> unless there's just an obvious fix for an obvious bug [...]
>
> Exactly so that maintainers don't get stressed out over having a pending
> problem report that people keep pestering them about.
>
> I think people are sometimes a bit too bought into whatever changes
> they made, and reverting is seen as "too drastic", but I think it's
> often the quick and easy solution for when there isn't some obvious
> response to a regression report.
>
>
> On why the "no regressions" rule exists
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> * From `2026-01-22
> <https://lore.kernel.org/all/CAHk-=wheqniw_wthgo7bkkt7uib-p+ai2jp9m+z+fycz6ca...@mail.gmail.com/
> >`_::
>
> But the basic rule is: be so good about backwards compatibility that
> users never have to worry about upgrading. They should absolutely feel
> confident that any kernel-reported problem will either be solved, or
> have an easy solution that is appropriate for *them* (ie a
> non-technical user shouldn't be expected to be able to do a lot).
>
> Because the last thing we want is people holding back from trying new
> kernels.
>
> * From `2024-05-28
> <https://lore.kernel.org/all/CAHk-=wgtb7y-beh7tpdvdwru7zkq8-kmjz53tsk37zsppdw...@mail.gmail.com/
> >`_::
>
> I introduced that "no regressions" rule something like two decades
> ago, because people need to be able to update their kernel without
> fear of something they relied on suddenly stopping to work.
>
> * From `2018-08-03
> <https://lore.kernel.org/all/CA+55aFwWZX=cxmwdtkdgb36kf12xmtehmqjbimpcqcrg2hi...@mail.gmail.com/
> >`_::
>
> The whole point of "we do not regress" is so that people can upgrade
> the kernel and never have to worry about it.
>
> [...]
>
> Because the only thing that matters IS THE USER.
>
> * From `2017-10-26(1)
> <https://lore.kernel.org/lkml/ca+55afxw7nmamvyhkvz1upbutujewrt6yb51qax5rtrwowj...@mail.gmail.com/
> >`_::
>
> If the kernel used to work for you, the rule is that it continues to work
> for you.
>
> [...]
>
> People should basically always feel like they can update their kernel
> and simply not have to worry about it.
>
> I refuse to introduce "you can only update the kernel if you also
> update that other program" kind of limitations. If the kernel used to
> work for you, the rule is that it continues to work for you.
>
>
> On exceptions to the "no regressions" rule
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> * From `2026-01-22
> <https://lore.kernel.org/all/CAHk-=wheqniw_wthgo7bkkt7uib-p+ai2jp9m+z+fycz6ca...@mail.gmail.com/
> >`_::
>
> There are _very_ few exceptions to that rule, the main one being "the
> problem was a fundamental huge and gaping security issue and we *had* to
> make that change, and we couldn't even make your limited use-case just
> continue to work".
>
> The other exception is "the problem was reported years after it was
> introduced, and now most people rely on the new behavior".
>
> [...]
>
> Now, if it's one or two users and you can just get them to recompile,
> that's one thing. Niche hardware and odd use-cases can sometimes be
> solved that way, and regressions can sometimes be fixed by handholding
> every single reporter if the reporter is willing and able to change
> his or her workflow.
>
> * From `2023-04-20
> <https://lore.kernel.org/all/CAHk-=wis_qqy4odnynnki5b7qhosmxtoj1jxo5wmb6sruwq...@mail.gmail.com/
> >`_::
>
> And yes, I do consider "regression in an earlier release" to be a
> regression that needs fixing.
>
> There's obviously a time limit: if that "regression in an earlier
> release" was a year or more ago, and just took forever for people to
> notice, and it had semantic changes that now mean that fixing the
> regression could cause a _new_ regression, then that can cause me to
> go "Oh, now the new semantics are what we have to live with".
>
> * From `2021-09-20(3)
> <https://lore.kernel.org/all/CAHk-=wi7db2sj-wngvvsj7ak2cm556q8437soxo4ejt2bwp...@mail.gmail.com/
> >`_::
>
> Yes, we have situations where even regressions don't matter - like
> major security issues that simply cannot be fixed other ways, because
> the regression _was_ the security hole.
>
> * From `2017-10-26(2)
> <https://lore.kernel.org/lkml/ca+55afxw7nmamvyhkvz1upbutujewrt6yb51qax5rtrwowj...@mail.gmail.com/
> >`_::
>
> There have been exceptions, but they are few and far between, and they
> generally have some major and fundamental reasons for having happened,
> that were basically entirely unavoidable, and people _tried_hard_ to
> avoid them. Maybe we can't practically support the hardware any more
> after it is decades old and nobody uses it with modern kernels any
> more. Maybe there's a serious security issue with how we did things,
> and people actually depended on that fundamentally broken model. Maybe
> there was some fundamental other breakage that just _had_ to have a
> flag day for very core and fundamental reasons.
>
>
> On accepting when a regression occurred
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> * From `2026-01-22
> <https://lore.kernel.org/all/CAHk-=wheqniw_wthgo7bkkt7uib-p+ai2jp9m+z+fycz6ca...@mail.gmail.com/
> >`_::
>
> But starting to argue about users reporting breaking changes is
> basically the final line for me. I have a couple of people that I have
> in my spam block-list and refuse to have anything to do with, and they
> have generally been about exactly that.
>
> Note how it's not about making mistakes and _causing_ the regression.
> That's normal. That's development. But then arguing about it is a
> no-no.
>
> * From `2024-06-23
> <https://lore.kernel.org/all/CAHk-=wi_KMO_rJ6OCr8mAWBRg-irziM=t9wxgc+j1vvoqb3...@mail.gmail.com/
> >`_::
>
> We don't introduce regressions and then blame others.
>
> There's a very clear rule in kernel development: things that break
> other things ARE NOT FIXES.
>
> EVER.
>
> They get reverted, or the thing they broke gets fixed.
>
> * From `2021-06-05
> <https://lore.kernel.org/all/CAHk-=wiuvqhn76yuwhkjzzwtdjmmjf_zn4+u7vejjmegh3r...@mail.gmail.com/
> >`_::
>
> THERE ARE NO VALID ARGUMENTS FOR REGRESSIONS.
>
> Honestly, security people need to understand that "not working" is not
> a success case of security. It's a failure case.
>
> Yes, "not working" may be secure. But security in that case is
> *pointless*.
>
> * From `2017-10-26(5)
> <https://lore.kernel.org/lkml/CA+55aFwiiQYJ+YoLKCXjN_beDVfu38mg=ggg5lfocqhe8qi...@mail.gmail.com/
> >`_::
>
> [...] when regressions *do* occur, we admit to them and fix them, instead
> of
> blaming user space.
>
> The fact that you have apparently been denying the regression now for
> three weeks means that I will revert, and I will stop pulling apparmor
> requests until the people involved understand how kernel development
> is done.
>
>
> On back-and-forth
> ~~~~~~~~~~~~~~~~~
>
> * From `2024-05-28
> <https://lore.kernel.org/all/CAHk-=wgtb7y-beh7tpdvdwru7zkq8-kmjz53tsk37zsppdw...@mail.gmail.com/
> >`_::
>
> The "no regressions" rule is that we do not introduce NEW bugs.
>
> It *literally* came about because we had an endless dance of "fix two
> bugs, introduce one new one", and that then resulted in a system that
> you cannot TRUST.
>
> * From `2021-09-20(1)
> <https://lore.kernel.org/all/CAHk-=wi7db2sj-wngvvsj7ak2cm556q8437soxo4ejt2bwp...@mail.gmail.com/
> >`_::
>
>
> And the thing that makes regressions special is that back when I
> wasn't so strict about these things, we'd end up in endless "seesaw
> situations" where somebody would fix something, it would break
> something else, then that something else would break, and it would
> never actually converge on anything reliable at all.
>
> * From `2015-08-13
> <https://lore.kernel.org/all/ca+55afxk8-bsikwr_s-c+4g6wihkpqvmle34h9wozpeua6w...@mail.gmail.com/
> >`_::
>
> The strict policy of no regressions actually originally started mainly wrt
> suspend/resume issues, where the "fix one machine, break another" kind of
> back-and-forth caused endless problems, and meant that we didn't actually
> necessarily make any forward progress, just moving a problem around.
>
>
> On regressions caused by bugfixes
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> * From `2018-08-03
> <https://lore.kernel.org/all/CA+55aFwWZX=cxmwdtkdgb36kf12xmtehmqjbimpcqcrg2hi...@mail.gmail.com/
> >`_::
>
> > Kernel had a bug which has been fixed
>
> That is *ENTIRELY* immaterial.
>
> Guys, whether something was buggy or not DOES NOT MATTER.
>
> [...]
>
> It's basically saying "I took something that worked, and I broke it,
> but now it's better". Do you not see how f*cking insane that statement
> is?
--
Matt Coster
E: [email protected]
OpenPGP_signature.asc
Description: OpenPGP digital signature
