Lo! On 2/20/26 21:53, Dave Airlie wrote: > > This is the fixes and cleanups for the end of the merge window, it's > nearly all amdgpu, with some amdkfd, then a pagemap core fix, i915/xe > display fixes, and some xe driver fixes. Nothing seems out of the > ordinary, except amdgpu is a little more volume than usual. > > Let me know if there are any issues,
Well, there were two fixes in here that made me wonder if our processes need some optimization to get regressions fixed at least somewhat as fast as Linus wants them to be fixed[1]: * One fix in here was for a amdgpu regression introduced in v6.19-rc6 (and also affecting many stable series due to backports). The fix was ready within ~2 days and could even have made v6.19 -- but it only reached mainline through this PR on Friday. IOW: After two weeks. Which got me wondering, "Should we do something to merge fixes like that faster"? And yes, it's the merge window – but that's also when Arch Linux and openSUSE Tumbleweed usually jump to the latest mainline series and thus expose regressions like this to many users, so I guess it would be good to get them fixed at least as fast as outside of merge windows. * One fix in here was for a i915/xe regression introduced in v6.18-rc1. Once reported, it took about six weeks to get fixed – and then nearly 10 days for the fix to reach mainline. Looking at this, I once more wondered if this could have been merged faster. But even more I wondered why the culprit wasn't reverted, as that's what Linus afaics wants when it takes this long. Note, these are examples of problems that happen in other subsystems as well; I chose to bring it up here just because they were good examples, as both regressions were also reported at least three times, so those are not really corner cases. See below for all the details. [1] "But if it's a regression with a known commit that caused it, I think the rule of thumb [to fix it] should generally be "within a week", preferably before the next rc." https://lore.kernel.org/all/cahk-%3dwi86aosxs66-yi54%2bmpqjpu0upxb8zafg%[email protected]/ > Mario Limonciello (2): > [...] > drm/amd: Fix hang on amdgpu unload by using pci_dev_is_disconnected() This is f7afda7fcd169a ("drm/amd: Fix hang on amdgpu unload by using pci_dev_is_disconnected()") [authored: 2026-02-05 17:42:54 GMT+1; committed: 2026-02-05 23:25:57 GMT+1 by Alex; next arrival: next-20260209; merged: 2026-02-21 00:36:38 GMT+1; v6.19-post]. It fixes a regression that has been reported at least three times: * On Tue, 3 Feb 2026 17:27:00 -0500 (EST): https://lore.kernel.org/all/[email protected]/ * On February 5, 2026 at 1:30:12 PM GMT+1: https://gitlab.freedesktop.org/drm/amd/-/issues/4944 * February 18, 2026 at 9:30:39 PM GMT+1: https://gitlab.freedesktop.org/drm/amd/-/issues/4984 And likely a fourth time on February 7, 2026 at 7:25:40 PM GMT+1: https://gitlab.freedesktop.org/drm/amd/-/issues/4953 The culprit is 28695ca09d3264 ("drm/amd: Clean up kfd node on surprise disconnect") [also known as 6a23e7b4332c10; authored: 2026-01-07 22:37:28; committed: 2026-01-14 20:51:36; next arrival: next-20260119; merged: 2026-01-16 22:48:18; v6.19-rc6 (2026-01-19 00:42:45), v6.18.7 (2026-01-23 11:21:37), v6.12.67 (2026-01-23 11:18:52), v6.6.122 (2026-01-30 10:27:43)] Mario and Alex thus had a fix ready and committed within about two days after it was first reported. It thus is an "immediate fix" (yeah!), just how Linus wants it (see [1] above). But then it took two weeks to get it mainlined -- and will now take a few days more to reach all those stable trees where it is needed, too. Give the dates above it could have reached 6.19 (released 2026-02-08 22:03:27 GMT+1) if we really had wanted to. That fix could also have made the main drm PR this merge window (send Wed, 11 Feb 2026 17:26:03 +1000:), as Alex already asked for merging on Fri, 6 Feb 2026 14:27:06 -0500: https://lore.kernel.org/all/CAPM=9tzgmo1pweuxjaxqoms5ptsoe8jhp9poy23q6tvy66b...@mail.gmail.com/ https://lore.kernel.org/all/[email protected]/ If it made that pull, the fix could be in stable already by now. Maybe Alex PR just fell through the cracks. Happens, but overall this still made me wonder: (1) Should there maybe have been an additional PR this merge window to speed things up? Or some fast track for regressions? (2) Or should the fix (or a revert of the culprit) maybe even have been sent to Linus for 6.19? That would have saved at least one user from bisecting and reporting the regression (and likely a few others that never reported it). >From Linus' mail I linked above, I'd assume he would have preferred the second option here, even if it would have been a last minute fix. If so: how could we make that happen more often in the future? Side note: yes, unbinding a module is likely something only a few users do -- but given those three or four reports, it seems it's not that unusual. And I don't care too much about this specific fix anyway, as it's just an example for the "time it takes fixes for recent regressions to reach mainline" aspect that I see all the time in many subsystems. To elaborate on that, let me give another example: > Imre Deak (2): > drm/i915/dp: Fix pipe BPP clamping due to HDR This is now fe26ae6ac8b88f ("drm/i915/dp: Fix pipe BPP clamping due to HDR") [authored: 2026-02-09 14:38:16 GMT+1; committed: 2026-02-12 07:03:08 GMT+1; next arrival: next-20260212; merged: 2026-02-21 00:36:38 GMT+1; v6.19-post]. That commit fixes a regressions that has been reported at least three times: * On December 30, 2025 at 5:07:48 PM GMT+1 https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/15503 * On January 13, 2026 at 11:51:11 PM GMT+1 https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7052 * On February 15, 2026 at 10:13:48 PM GMT+1 https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/7269 That regression is caused by ba49a4643cf53c ("drm/i915/dp: Set min_bpp limit to 30 in HDR mode") [authored: 2025-07-30 07:55:23 GMT+; committed: 2025-08-19 08:32:40 GMT+; next arrival: next-20250820; merged: 2025-10-02 21:47:25 GMT+; v6.18-rc1 (2025-10-12 22:42:36 GMT+)]. The regression took way longer to get resolved than the first example, which makes me wonder: (1) Should the culprit have been reverted weeks ago to get closer to the "immediate fix" target that Linus wants? (2) This fix also took nine days from being committed to reaching mainline. It came a bit too late for the first drm PR this cycle. So again: Would more frequent PRs help here? Or some fast-track path for regression fixes? Ciao, Thorsten
