On 2026-05-05 12:36, Leo Li wrote:
>
>
> On 2026-05-05 12:02, Leo Li wrote:
>>> + /*
>>> + * Compositors will refuse to make forward progress unless we send
>>> + * the previous flip's completion event.
>>> + */
>>> + if (WARN_ON(acrtc->event)) {
>>> + drm_crtc_send_vblank_event(&acrtc->base, acrtc->event);
>>> + drm_crtc_vblank_put(&acrtc->base);
>>> + }
>> I would expect this WARN_ON to occur only after the 10s flip_done timeout
>> expires, allowing 'this' commit to progress with the previously armed
>> acrtc->event and ->pflip_status from the previous commit ('this' commit
>> would be gated by drm_atomic_helper_wait_for_dependencies).
>>
>> In which case, we probably want to apply the same above hunk for the cursor
>> path here and also raise a warning:
>> https://elixir.bootlin.com/linux/v6.19.3/source/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L10170
>
> Hmm, scratch that, looks like I didn't finish my own thought:
>
> I think this would also mean the compositor sent 'this' commit without
> waiting for the previous vblank event. IOW it's not that compositors refuse
> to make forward progress, but wait_for_dependencies() in kernel will wait for
> a flip_done completion that will never come. And that can be a long time,
> since the drm_crtc_commit_wait()s stack.
>
> Indeed, it seems to be the case in the dmesg log attached to this issue:
> https://gitlab.freedesktop.org/drm/amd/-/work_items/4809
>
> The back-to-back WARN_ONs are likely from the hunk above, and one level up at
> https://elixir.bootlin.com/linux/v7.0.1/source/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c#L10186.
> The compositor attempts another commit soon after, and hits the series of
> timeouts within wait_for_dependencies(), before we hit the next series of
> WARN_ONs from the same locations.
>
> So ultimately, I don't think sending the event here will do anything. Since
> by the time we hit that WARN_ON, we've already waited through
> wait_for_dependencies(). At which point, there aren't anymore waiters on it.
>
> I'm thinking an alternative would be to issue a 1-2s delayed worker whenever
> prepare_flip_isr() is called. The worker is canceled whenever the event is
> sent from the irq handlers. If it runs, then the expected pflip interrupt
> never fired, so we deliver the event in the worker. It's effectively a SW
> fallback for event delivery.
>
> Of course, it is only a fallback, ideally we figure out why interrupts were
> missed in the first place.
>
> - Leo
(Apologies for the spam)
But regarding this patch, I'm OK with dropping the above hunk but leaving the
dm_helpers_dmu_timeout() implementation.
With that change, this is
Reviewed-by: Leo Li <[email protected]>