On 2026-05-06 16:00, Leo Li wrote:
>
>
> On 2026-05-04 16:54, Timur Kristóf wrote:
>> On Monday, May 4, 2026 8:36:49 PM Central European Summer Time
>> [email protected] wrote:
>>> From: Leo Li <[email protected]>
>>>
>>> [Why]
>>>
>>> VStartup is an OTG event that fires when the pixel pipeline prepares for
>>> pixel scanout of the next frame. It was previously used to deliver
>>> vblank events for commits that do not trigger a fb address update, and
>>> hence a pflip interrupt (hw cursor updates, for example).
>>>
>>> The issue with vstartup is that HW can mask the interrupt in cases where
>>> idle optimizations are enabled or when a HW lock is active. This could
>>> the explain the range of flip_done timeouts frequently seen in the wild.
>> Can you help me understand how that could happen with vstartup?
>> Specifically, what is a "HW lock" and when is it active?
>
> Hi Timur,
>
> I should've prefaced this patch to say that this is a theoretical fix. I
> haven't
> been able to reproduce the timeout issues myself, and this patch came out of
> internal discussions with folks more familiar with the HW. I don't think this
> will fix *all* cases of flip_done timeouts, but it may address some of them.
>
> (But timeouts aside, we *should* transition to vline since it's more reliable
> than vstartup.)
>
> To answer your questions: depending on the DCN generation, there can be a few
> things that affects vstartup firing:
>
> * DPG - DCN can Dynamically Power Gate parts of the display pipe when a
> self-refresh capable eDP is connected. DPG is engaged when there's enough
> static frames (detected thru drm_vblank_off) Once gated, even though the OTG
> (output timing generator) is still enabled, vstartup is masked. vline is
> unaffected.
>
> * GSL - Driver can use the Global Sync Lock to block HW from latching onto
> double-buffered registers during programming, to prevent HW from latching
> onto
> a partially programmed state. This will mask vstartup, but vline is
> unaffected. See dcn20_pipe_control_lock()
>
> * MALL - A DCN accessible cache introduced in DCN32+ DGPUs that can store fb
> data to allow for longer DRAM sleep. When scanning out from MALL, vstartup
> is
> masked, vline is unaffected.
>
>>
>> Many users have experienced flip_done timeouts while playing games.
>> In that scenario, would any idle optimization be enabled or is there a "HW
>> lock"?
>
> If the game stops submitting frames for ~15 refresh cycles, it's possible that
> PSR kicks in. Though I know there are plenty of reporters running on external
> without PSR support. If it's DGPUs, it's very likely due to MALL. A reporter I
> was debugging with said disabling MALL showed good results[1]. If it's an APU
> with an external monitor, then that's less clear.
>
> A lot of the reporters seem to be running Phoenix (DCN314), with a common
> symptom of DMUB timing out[2]. If a self-refresh panel is involved, then I'm
> curious if this vline2 patch would help. Hamza's recent patch[3] that enables
> various levels of reset may help to mitigate, but it doesn't fix the
> root-cause.
> I'm planning a branch with this patch and [3], along with debug dumps on
> flip_done timeouts for reporters to try.
>
> [1]https://lore.kernel.org/amd-gfx/[email protected]/
> [2]https://gitlab.freedesktop.org/drm/amd/-/work_items/4831
> [3]https://lore.kernel.org/lkml/[email protected]/
>
>>
>>> DCN hardware provides 3 generic OTG interrupts that can be programmed to>>
>>> fire on a specific line. Vline 0 and 1 are currently reserved, with
>>> vline2 available to use for event delivery. These interrupts cannot
>>> be masked, as long as the OTG is active.
>>>
>>> [How]
>>>
>>> Switch to vline2 for vblank handling. Today, DC will program the
>>> vline2 position to at vupdate -- the point at which HW latches to
>>> double-buffered registers.
>>>
>>> Since all the vline interrupt types share the same interrupt src_id,
>>> refactor the existing vline0 infrastructure to allow for all the vline0,
>>> 1, and 2 types.
>>>
>>> Since this is intended to replace vstartup for DCN, use the same handler
>>> logic, but be careful to leave DCE on vstartup.
>> Why not also switch DCE?
>> Does DCE not have the vline interrupts or does it not have the same issue
>> with
>> the vstartup interrupt?
>
> I didn't want to touch DCE since I don't have information on how these
> interrupts behave on them, and I didn't want to regress anything. Would need
> to
> do some digging to find out.
>
DCE's architecture is quite different in this regard. No VSTARTUP, VUPDATE
signals and interrupts on DCE.
Harry
> - Leo
>
>>
>>> Signed-off-by: Leo Li <[email protected]>
>> I think this patch should have a "Fixes:" tag or another way to indicate
>> that
>> it should be backported to stable kernels.
>>
>> Thanks,
>> Timur
>