在 2026/5/7 06:03, Timur Kristóf 写道:
On Wednesday, May 6, 2026 10:00:12 PM Central European Summer Time Leo Li wrote:On 2026-05-04 16:54, Timur Kristóf wrote:On Monday, May 4, 2026 8:36:49 PM Central European Summer Time[email protected] wrote:From: Leo Li<[email protected]> [Why] VStartup is an OTG event that fires when the pixel pipeline prepares for pixel scanout of the next frame. It was previously used to deliver vblank events for commits that do not trigger a fb address update, and hence a pflip interrupt (hw cursor updates, for example). The issue with vstartup is that HW can mask the interrupt in cases where idle optimizations are enabled or when a HW lock is active. This could the explain the range of flip_done timeouts frequently seen in the wild.Can you help me understand how that could happen with vstartup? Specifically, what is a "HW lock" and when is it active?Hi Timur, I should've prefaced this patch to say that this is a theoretical fix. I haven't been able to reproduce the timeout issues myself, and this patch came out of internal discussions with folks more familiar with the HW. I don't think this will fix *all* cases of flip_done timeouts, but it may address some of them.I see. Yeah, I've only very rarely seen that issue myself. Seems that the bug avoids driver devs, but it's very popular among end users.
Btw according to Michele's test result, such issue would be hidden by debug options due to code running slower: https://lore.kernel.org/amd-gfx/[email protected]/
(But timeouts aside, we *should* transition to vline since it's more reliable than vstartup.)I agree.To answer your questions: depending on the DCN generation, there can be a few things that affects vstartup firing: * DPG - DCN can Dynamically Power Gate parts of the display pipe when a self-refresh capable eDP is connected. DPG is engaged when there's enough static frames (detected thru drm_vblank_off) Once gated, even though the OTG (output timing generator) is still enabled, vstartup is masked. vline is unaffected. * GSL - Driver can use the Global Sync Lock to block HW from latching onto double-buffered registers during programming, to prevent HW from latching onto a partially programmed state. This will mask vstartup, but vline is unaffected. See dcn20_pipe_control_lock() * MALL - A DCN accessible cache introduced in DCN32+ DGPUs that can store fb data to allow for longer DRAM sleep. When scanning out from MALL, vstartup is masked, vline is unaffected.Thanks for the explanation. Just one more question: does DCN always mask the VSTARTUP interrupt under those conditions or is that configurable?Many users have experienced flip_done timeouts while playing games. In that scenario, would any idle optimization be enabled or is there a "HW lock"?If the game stops submitting frames for ~15 refresh cycles, it's possible that PSR kicks in. Though I know there are plenty of reporters running on external without PSR support. If it's DGPUs, it's very likely due to MALL. A reporter I was debugging with said disabling MALL showed good results[1]. If it's an APU with an external monitor, then that's less clear. A lot of the reporters seem to be running Phoenix (DCN314), with a common symptom of DMUB timing out[2]. If a self-refresh panel is involved, then I'm curious if this vline2 patch would help. Hamza's recent patch[3] that enables various levels of reset may help to mitigate, but it doesn't fix the root-cause. I'm planning a branch with this patch and [3], along with debug dumps on flip_done timeouts for reporters to try.That's very nice to hear. I'm crossing my fingers that it works out.[1]https://lore.kernel.org/amd-gfx/e415c38b-4102-40e4-a195-0256caf34802@m1k. cloud/ [2]https://gitlab.freedesktop.org/drm/amd/-/work_items/4831 [3]https://lore.kernel.org/lkml/20260505182105.420525-2-someguy@effective-li ght.com/DCN hardware provides 3 generic OTG interrupts that can be programmed to>> fire on a specific line. Vline 0 and 1 are currently reserved, with vline2 available to use for event delivery. These interrupts cannot be masked, as long as the OTG is active. [How] Switch to vline2 for vblank handling. Today, DC will program the vline2 position to at vupdate -- the point at which HW latches to double-buffered registers. Since all the vline interrupt types share the same interrupt src_id, refactor the existing vline0 infrastructure to allow for all the vline0, 1, and 2 types. Since this is intended to replace vstartup for DCN, use the same handler logic, but be careful to leave DCE on vstartup.Why not also switch DCE? Does DCE not have the vline interrupts or does it not have the same issue with the vstartup interrupt?I didn't want to touch DCE since I don't have information on how these interrupts behave on them, and I didn't want to regress anything. Would need to do some digging to find out.Do we have any reports of these page flip timeouts on DCE? Maybe it's better to leave DCE well enough alone if the issue doesn't exist there. (I have never seen one, but that doesn't mean it doesn't exist.) Best regards, Timur
OpenPGP_0xE3520CC91929C8E7.asc
Description: OpenPGP public key
OpenPGP_signature.asc
Description: OpenPGP digital signature
