On 2026. január 26., hétfő 14:00:59 közép-európai téli idő Christian König wrote: > On 1/26/26 11:27, Michel Dänzer wrote: > > On 1/26/26 11:14, Christian König wrote: > >> On 1/23/26 15:44, Timur Kristóf wrote: > >>> On Friday, January 23, 2026 2:52:44 PM Central European Standard Time > >>> > >>> Christian König wrote: > >>>> So as far as I can see the whole approach doesn't make any sense at > >>>> all. > >>> > >>> Actually this approach was proposed as a solution at XDC 2025 in Harry's > >>> presentation, "DRM calls driver callback to attempt recovery", see page > >>> 9 in this slide deck: > >>> > >>> https://indico.freedesktop.org/event/10/contributions/431/attachments/ > >>> 267/355/2025%20XDC%20Hackfest%20Update%20v1.2.pdf > >>> > >>> If you disagree with Harry, please make a counter-proposal. > >> > >> Well I must have missed that detail otherwise I would have objected. > >> > >> But looking at the slide Harry actually pointed out what immediately came > >> to my mind as well, e.g. that the Compositor needs to issue a full > >> modeset to re-program the CRTC.> > > In principle, the kernel driver has all the information it needs to > > reprogram the HW by itself. Not sure why the compositor would need to be > > actively involved. > Well first of all I'm not sure if we can reprogram the HW even if all > information are available. > > Please keep in mind that we are in a dma_fence timeout handler here with the > usual rat tail of consequences. So no allocation of memory or taking locks > under which memory is allocated or are part of preparing the page flip > etc... I'm not so deep in the atomic code, so Alex, Sima and probably you > as well can answer that much better than I do, but of hand it sounds > questionable. > > On the other hand we could of course postpone reprogramming the CRTC into an > async work item, but that might created more problems then it solves. > > Then second even if the kernel can do it I'm not sure if it should do it. > > I mean userspace asked for a quick page flip and not some expensive CRTC/PLL > reprogramming. Stuff like that usually takes some time and by then the > frame which should be displayed by the page flip might already be stale and > it would be better to tell userspace that we couldn't display it on time > and wait for a new frame to be generated.
I agree with Michel here. It's a kernel bug, it should be solved by the kernel. I don't like the tendency of pushing userspace to handle kernel bugs, especially if this is just needed for one vendor's buggy driver. (No offence.) > > And third, there must be a root cause of the page flip not completing. > > My educated guess is that we have some atomic property change or even > turning the CRTC off in parallel with the page flip. I mean HW rarely turns > off its reoccurring vblank interrupt on its own. I think a page flip timeout can many root causes, so it's unlikely that a single solution would solve all of it. See: https://gitlab.freedesktop.org/drm/amd/-/issues? label_name[]=page%20flip%20timeout Best regards, Timur
