display: add DMU timeout recovery support

Leo Li Tue, 05 May 2026 09:36:31 -0700

On 2026-05-05 12:02, Leo Li wrote:
>> +    /*
>> +     * Compositors will refuse to make forward progress unless we send
>> +     * the previous flip's completion event.
>> +     */
>> +    if (WARN_ON(acrtc->event)) {
>> +            drm_crtc_send_vblank_event(&acrtc->base, acrtc->event);
>> +            drm_crtc_vblank_put(&acrtc->base);
>> +    }
> I would expect this WARN_ON to occur only after the 10s flip_done timeout 
> expires, allowing 'this' commit to progress with the previously armed 
> acrtc->event and ->pflip_status from the previous commit ('this' commit would 
> be gated by drm_atomic_helper_wait_for_dependencies).
> 
> In which case, we probably want to apply the same above hunk for the cursor 
> path here and also raise a warning: 
> https://elixir.bootlin.com/linux/v6.19.3/source/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L10170

Hmm, scratch that, looks like I didn't finish my own thought:

I think this would also mean the compositor sent 'this' commit without waiting 
for the previous vblank event. IOW it's not that compositors refuse to make 
forward progress, but wait_for_dependencies() in kernel will wait for a 
flip_done completion that will never come. And that can be a long time, since 
the drm_crtc_commit_wait()s stack.

Indeed, it seems to be the case in the dmesg log attached to this issue: 
https://gitlab.freedesktop.org/drm/amd/-/work_items/4809

The back-to-back WARN_ONs are likely from the hunk above, and one level up at 
https://elixir.bootlin.com/linux/v7.0.1/source/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L10186.
 The compositor attempts another commit soon after, and hits the series of 
timeouts within wait_for_dependencies(), before we hit the next series of 
WARN_ONs from the same locations.

So ultimately, I don't think sending the event here will do anything. Since by 
the time we hit that WARN_ON, we've already waited through 
wait_for_dependencies(). At which point, there aren't anymore waiters on it.

I'm thinking an alternative would be to issue a 1-2s delayed worker whenever 
prepare_flip_isr() is called. The worker is canceled whenever the event is sent 
from the irq handlers. If it runs, then the expected pflip interrupt never 
fired, so we deliver the event in the worker. It's effectively a SW fallback 
for event delivery.

Of course, it is only a fallback, ideally we figure out why interrupts were 
missed in the first place.

- Leo
Re: [PATCH v5 2/2] drm/amd/display: add DMU timeout recovery support

Reply via email to