在 2026/5/7 06:03, Timur Kristóf 写道:
On Wednesday, May 6, 2026 10:00:12 PM Central European Summer Time Leo Li
wrote:
On 2026-05-04 16:54, Timur Kristóf wrote:
On Monday, May 4, 2026 8:36:49 PM Central European Summer Time

[email protected] wrote:
From: Leo Li<[email protected]>

[Why]

VStartup is an OTG event that fires when the pixel pipeline prepares for
pixel scanout of the next frame. It was previously used to deliver
vblank events for commits that do not trigger a fb address update, and
hence a pflip interrupt (hw cursor updates, for example).

The issue with vstartup is that HW can mask the interrupt in cases where
idle optimizations are enabled or when a HW lock is active. This could
the explain the range of flip_done timeouts frequently seen in the wild.
Can you help me understand how that could happen with vstartup?
Specifically, what is a "HW lock" and when is it active?
Hi Timur,

I should've prefaced this patch to say that this is a theoretical fix. I
haven't been able to reproduce the timeout issues myself, and this patch
came out of internal discussions with folks more familiar with the HW. I
don't think this will fix *all* cases of flip_done timeouts, but it may
address some of them.
I see.
Yeah, I've only very rarely seen that issue myself. Seems that the bug avoids
driver devs, but it's very popular among end users.

Btw according to Michele's test result, such issue would be hidden by

debug options due to code running slower:

https://lore.kernel.org/amd-gfx/[email protected]/


(But timeouts aside, we *should* transition to vline since it's more
reliable than vstartup.)
I agree.

To answer your questions: depending on the DCN generation, there can be a
few things that affects vstartup firing:

* DPG - DCN can Dynamically Power Gate parts of the display pipe when a
   self-refresh capable eDP is connected. DPG is engaged when there's enough
   static frames (detected thru drm_vblank_off) Once gated, even though the
OTG (output timing generator) is still enabled, vstartup is masked. vline
is unaffected.

* GSL - Driver can use the Global Sync Lock to block HW from latching onto
   double-buffered registers during programming, to prevent HW from latching
onto a partially programmed state. This will mask vstartup, but vline is
unaffected. See dcn20_pipe_control_lock()

* MALL - A DCN accessible cache introduced in DCN32+ DGPUs that can store fb
data to allow for longer DRAM sleep. When scanning out from MALL, vstartup
is masked, vline is unaffected.
Thanks for the explanation.
Just one more question: does DCN always mask the VSTARTUP interrupt under
those conditions or is that configurable?

Many users have experienced flip_done timeouts while playing games.
In that scenario, would any idle optimization be enabled or is there a "HW
lock"?
If the game stops submitting frames for ~15 refresh cycles, it's possible
that PSR kicks in. Though I know there are plenty of reporters running on
external without PSR support. If it's DGPUs, it's very likely due to MALL.
A reporter I was debugging with said disabling MALL showed good results[1].
If it's an APU with an external monitor, then that's less clear.

A lot of the reporters seem to be running Phoenix (DCN314), with a common
symptom of DMUB timing out[2]. If a self-refresh panel is involved, then I'm
curious if this vline2 patch would help. Hamza's recent patch[3] that
enables various levels of reset may help to mitigate, but it doesn't fix
the root-cause. I'm planning a branch with this patch and [3], along with
debug dumps on flip_done timeouts for reporters to try.

That's very nice to hear. I'm crossing my fingers that it works out.

[1]https://lore.kernel.org/amd-gfx/e415c38b-4102-40e4-a195-0256caf34802@m1k.
cloud/ [2]https://gitlab.freedesktop.org/drm/amd/-/work_items/4831
[3]https://lore.kernel.org/lkml/20260505182105.420525-2-someguy@effective-li
ght.com/
DCN hardware provides 3 generic OTG interrupts that can be programmed
to>> fire on a specific line. Vline 0 and 1 are currently reserved, with
vline2 available to use for event delivery. These interrupts cannot be
masked, as long as the OTG is active.

[How]

Switch to vline2 for vblank handling. Today, DC will program the
vline2 position to at vupdate -- the point at which HW latches to
double-buffered registers.

Since all the vline interrupt types share the same interrupt src_id,
refactor the existing vline0 infrastructure to allow for all the vline0,
1, and 2 types.

Since this is intended to replace vstartup for DCN, use the same handler
logic, but be careful to leave DCE on vstartup.
Why not also switch DCE?
Does DCE not have the vline interrupts or does it not have the same issue
with the vstartup interrupt?
I didn't want to touch DCE since I don't have information on how these
interrupts behave on them, and I didn't want to regress anything. Would need
to do some digging to find out.

Do we have any reports of these page flip timeouts on DCE?
Maybe it's better to leave DCE well enough alone if the issue doesn't exist
there. (I have never seen one, but that doesn't mean it doesn't exist.)

Best regards,
Timur





Attachment: OpenPGP_0xE3520CC91929C8E7.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to