On 2026-05-04 16:54, Timur Kristóf wrote:
> On Monday, May 4, 2026 8:36:49 PM Central European Summer Time 
> [email protected] wrote:
>> From: Leo Li <[email protected]>
>>
>> [Why]
>>
>> VStartup is an OTG event that fires when the pixel pipeline prepares for
>> pixel scanout of the next frame. It was previously used to deliver
>> vblank events for commits that do not trigger a fb address update, and
>> hence a pflip interrupt (hw cursor updates, for example).
>>
>> The issue with vstartup is that HW can mask the interrupt in cases where
>> idle optimizations are enabled or when a HW lock is active. This could
>> the explain the range of flip_done timeouts frequently seen in the wild.
> Can you help me understand how that could happen with vstartup?
> Specifically, what is a "HW lock" and when is it active?

Hi Timur,

I should've prefaced this patch to say that this is a theoretical fix. I haven't
been able to reproduce the timeout issues myself, and this patch came out of
internal discussions with folks more familiar with the HW. I don't think this
will fix *all* cases of flip_done timeouts, but it may address some of them.

(But timeouts aside, we *should* transition to vline since it's more reliable
than vstartup.)

To answer your questions: depending on the DCN generation, there can be a few
things that affects vstartup firing:

* DPG - DCN can Dynamically Power Gate parts of the display pipe when a
  self-refresh capable eDP is connected. DPG is engaged when there's enough
  static frames (detected thru drm_vblank_off) Once gated, even though the OTG
  (output timing generator) is still enabled, vstartup is masked. vline is
  unaffected.

* GSL - Driver can use the Global Sync Lock to block HW from latching onto
  double-buffered registers during programming, to prevent HW from latching onto
  a partially programmed state. This will mask vstartup, but vline is
  unaffected. See dcn20_pipe_control_lock()

* MALL - A DCN accessible cache introduced in DCN32+ DGPUs that can store fb
  data to allow for longer DRAM sleep. When scanning out from MALL, vstartup is
  masked, vline is unaffected.

> 
> Many users have experienced flip_done timeouts while playing games.
> In that scenario, would any idle optimization be enabled or is there a "HW 
> lock"?

If the game stops submitting frames for ~15 refresh cycles, it's possible that
PSR kicks in. Though I know there are plenty of reporters running on external
without PSR support. If it's DGPUs, it's very likely due to MALL. A reporter I
was debugging with said disabling MALL showed good results[1]. If it's an APU
with an external monitor, then that's less clear.

A lot of the reporters seem to be running Phoenix (DCN314), with a common
symptom of DMUB timing out[2]. If a self-refresh panel is involved, then I'm
curious if this vline2 patch would help. Hamza's recent patch[3] that enables
various levels of reset may help to mitigate, but it doesn't fix the root-cause.
I'm planning a branch with this patch and [3], along with debug dumps on
flip_done timeouts for reporters to try.

[1]https://lore.kernel.org/amd-gfx/[email protected]/
[2]https://gitlab.freedesktop.org/drm/amd/-/work_items/4831
[3]https://lore.kernel.org/lkml/[email protected]/

> 
>> DCN hardware provides 3 generic OTG interrupts that can be programmed to>> 
>> fire on a specific line. Vline 0 and 1 are currently reserved, with
>> vline2 available to use for event delivery. These interrupts cannot
>> be masked, as long as the OTG is active.
>>
>> [How]
>>
>> Switch to vline2 for vblank handling. Today, DC will program the
>> vline2 position to at vupdate -- the point at which HW latches to
>> double-buffered registers.
>>
>> Since all the vline interrupt types share the same interrupt src_id,
>> refactor the existing vline0 infrastructure to allow for all the vline0,
>> 1, and 2 types.
>>
>> Since this is intended to replace vstartup for DCN, use the same handler
>> logic, but be careful to leave DCE on vstartup.
> Why not also switch DCE?
> Does DCE not have the vline interrupts or does it not have the same issue 
> with 
> the vstartup interrupt?

I didn't want to touch DCE since I don't have information on how these
interrupts behave on them, and I didn't want to regress anything. Would need to
do some digging to find out.

- Leo

> 
>> Signed-off-by: Leo Li <[email protected]>
> I think this patch should have a "Fixes:" tag or another way to indicate that 
> it should be backported to stable kernels.
> 
> Thanks,
> Timur

Reply via email to