On 4/2/26 21:01, Linus Torvalds wrote:
> On Thu, 2 Apr 2026 at 11:27, Alex Deucher <[email protected]> wrote:
>> There are always new fixes.  Worse case it ends up in 7.0.1.  If it
>> causes other regressions, then we end up introducing a new regression
>> in rc7.
> This was a regression in rc1, that was reported several weeks ago.
> Anything that gets reported that early in the release cycle is bound
> to hit lots of people, because the number of people testing early rc
> kernels is relatively small.
> 
> So why pointlessly delay *known* regressions for fear of a potential new one?
> 
> And why point out rc7, when dammit, this could have been fixed long
> before and *not* be that late in the release? [...]

Thx for jumping in there. While at it: I see situations like this all
the time (I'll reply to this mail with a few current examples), except
in a few subsystems that do a good job (Jens, for example, does). Which
makes me wonder:

What can we/I do to improve things so maintainers more often handle
regressions like you want them to?

Because right now I feel like running around spending (wasting?) a lot
of time on tracking regressions and upsetting maintainers when speaking
up -- without making much of a difference in the end, unless you reply
when I CC you. But even that often just helps in the particular
situation without improving problematic workflows and habits much or at
all. The issue discussed in this thread is a pretty good example of
that, as we just at the beginning of this cycle discussed[1] an amdgpu
regression where a fix for a 6.19-rc6 regression affecting various
stable series could easily have gone into 6.19, but in the end was only
mainlined at the end of the 7.0 merge window. So without me complaining
again (which only helped after you jumped in), something pretty similar
could have been the outcome this cycle again, too.

> So  honestly, there are exactly two choices: apply the fix, or just
> revert the commit that caused the problem in the first place.

BTW, Alex, thanks for pushing the fix upstream in between! Ohh, and
please don't take the above personally: what I say there is in no way
specific to amdgpu and/or the drm subsystem. It from what I see handles
regressions better than quite a few some other subsystems . But here it
coincidentally served as a good example for a general problem.

Ciao, Thorsten

[1]
https://lore.kernel.org/all/[email protected]/
Fun fact: the fix (f7afda7fcd169a ("drm/amd: Fix hang on amdgpu unload
by using pci_dev_is_disconnected()")) was posted and committed in a
subsystem tree within two days, but the regression in the end was
present 54 days in the 6.6.y series, 48 days in the 6.18.y and 6.12.y
series, and 45 days in 6.19.y (or three weeks more if you started
counting at 6.19-rc6, which was the first version containing the
culprit), as it took three (6.18.y and 6.12.y) to five weeks (6.19.y and
6.6.y [the two extra weeks were due to a mistake afaics]) from mainline
to affected stable trees.

Reply via email to