On 2026-05-05 12:02, Leo Li wrote:
>> + /*
>> + * Compositors will refuse to make forward progress unless we send
>> + * the previous flip's completion event.
>> + */
>> + if (WARN_ON(acrtc->event)) {
>> + drm_crtc_send_vblank_event(&acrtc->base, acrtc->event);
>> + drm_crtc_vblank_put(&acrtc->base);
>> + }
> I would expect this WARN_ON to occur only after the 10s flip_done timeout
> expires, allowing 'this' commit to progress with the previously armed
> acrtc->event and ->pflip_status from the previous commit ('this' commit would
> be gated by drm_atomic_helper_wait_for_dependencies).
>
> In which case, we probably want to apply the same above hunk for the cursor
> path here and also raise a warning:
> https://elixir.bootlin.com/linux/v6.19.3/source/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L10170
Hmm, scratch that, looks like I didn't finish my own thought:
I think this would also mean the compositor sent 'this' commit without waiting
for the previous vblank event. IOW it's not that compositors refuse to make
forward progress, but wait_for_dependencies() in kernel will wait for a
flip_done completion that will never come. And that can be a long time, since
the drm_crtc_commit_wait()s stack.
Indeed, it seems to be the case in the dmesg log attached to this issue:
https://gitlab.freedesktop.org/drm/amd/-/work_items/4809
The back-to-back WARN_ONs are likely from the hunk above, and one level up at
https://elixir.bootlin.com/linux/v7.0.1/source/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L10186.
The compositor attempts another commit soon after, and hits the series of
timeouts within wait_for_dependencies(), before we hit the next series of
WARN_ONs from the same locations.
So ultimately, I don't think sending the event here will do anything. Since by
the time we hit that WARN_ON, we've already waited through
wait_for_dependencies(). At which point, there aren't anymore waiters on it.
I'm thinking an alternative would be to issue a 1-2s delayed worker whenever
prepare_flip_isr() is called. The worker is canceled whenever the event is sent
from the irq handlers. If it runs, then the expected pflip interrupt never
fired, so we deliver the event in the worker. It's effectively a SW fallback
for event delivery.
Of course, it is only a fallback, ideally we figure out why interrupts were
missed in the first place.
- Leo