On 2026-05-06 16:00, Leo Li wrote:
> 
> 
> On 2026-05-04 16:54, Timur Kristóf wrote:
>> On Monday, May 4, 2026 8:36:49 PM Central European Summer Time 
>> [email protected] wrote:
>>> From: Leo Li <[email protected]>
>>>
>>> [Why]
>>>
>>> VStartup is an OTG event that fires when the pixel pipeline prepares for
>>> pixel scanout of the next frame. It was previously used to deliver
>>> vblank events for commits that do not trigger a fb address update, and
>>> hence a pflip interrupt (hw cursor updates, for example).
>>>
>>> The issue with vstartup is that HW can mask the interrupt in cases where
>>> idle optimizations are enabled or when a HW lock is active. This could
>>> the explain the range of flip_done timeouts frequently seen in the wild.
>> Can you help me understand how that could happen with vstartup?
>> Specifically, what is a "HW lock" and when is it active?
> 
> Hi Timur,
> 
> I should've prefaced this patch to say that this is a theoretical fix. I 
> haven't
> been able to reproduce the timeout issues myself, and this patch came out of
> internal discussions with folks more familiar with the HW. I don't think this
> will fix *all* cases of flip_done timeouts, but it may address some of them.
> 
> (But timeouts aside, we *should* transition to vline since it's more reliable
> than vstartup.)
> 
> To answer your questions: depending on the DCN generation, there can be a few
> things that affects vstartup firing:
> 
> * DPG - DCN can Dynamically Power Gate parts of the display pipe when a
>   self-refresh capable eDP is connected. DPG is engaged when there's enough
>   static frames (detected thru drm_vblank_off) Once gated, even though the OTG
>   (output timing generator) is still enabled, vstartup is masked. vline is
>   unaffected.
> 
> * GSL - Driver can use the Global Sync Lock to block HW from latching onto
>   double-buffered registers during programming, to prevent HW from latching 
> onto
>   a partially programmed state. This will mask vstartup, but vline is
>   unaffected. See dcn20_pipe_control_lock()
> 
> * MALL - A DCN accessible cache introduced in DCN32+ DGPUs that can store fb
>   data to allow for longer DRAM sleep. When scanning out from MALL, vstartup 
> is
>   masked, vline is unaffected.
> 
>>
>> Many users have experienced flip_done timeouts while playing games.
>> In that scenario, would any idle optimization be enabled or is there a "HW 
>> lock"?
> 
> If the game stops submitting frames for ~15 refresh cycles, it's possible that
> PSR kicks in. Though I know there are plenty of reporters running on external
> without PSR support. If it's DGPUs, it's very likely due to MALL. A reporter I
> was debugging with said disabling MALL showed good results[1]. If it's an APU
> with an external monitor, then that's less clear.
> 
> A lot of the reporters seem to be running Phoenix (DCN314), with a common
> symptom of DMUB timing out[2]. If a self-refresh panel is involved, then I'm
> curious if this vline2 patch would help. Hamza's recent patch[3] that enables
> various levels of reset may help to mitigate, but it doesn't fix the 
> root-cause.
> I'm planning a branch with this patch and [3], along with debug dumps on
> flip_done timeouts for reporters to try.
> 
> [1]https://lore.kernel.org/amd-gfx/[email protected]/
> [2]https://gitlab.freedesktop.org/drm/amd/-/work_items/4831
> [3]https://lore.kernel.org/lkml/[email protected]/
> 
>>
>>> DCN hardware provides 3 generic OTG interrupts that can be programmed to>> 
>>> fire on a specific line. Vline 0 and 1 are currently reserved, with
>>> vline2 available to use for event delivery. These interrupts cannot
>>> be masked, as long as the OTG is active.
>>>
>>> [How]
>>>
>>> Switch to vline2 for vblank handling. Today, DC will program the
>>> vline2 position to at vupdate -- the point at which HW latches to
>>> double-buffered registers.
>>>
>>> Since all the vline interrupt types share the same interrupt src_id,
>>> refactor the existing vline0 infrastructure to allow for all the vline0,
>>> 1, and 2 types.
>>>
>>> Since this is intended to replace vstartup for DCN, use the same handler
>>> logic, but be careful to leave DCE on vstartup.
>> Why not also switch DCE?
>> Does DCE not have the vline interrupts or does it not have the same issue 
>> with 
>> the vstartup interrupt?
> 
> I didn't want to touch DCE since I don't have information on how these
> interrupts behave on them, and I didn't want to regress anything. Would need 
> to
> do some digging to find out.
> 

DCE's architecture is quite different in this regard. No VSTARTUP, VUPDATE
signals and interrupts on DCE.

Harry

> - Leo
> 
>>
>>> Signed-off-by: Leo Li <[email protected]>
>> I think this patch should have a "Fixes:" tag or another way to indicate 
>> that 
>> it should be backported to stable kernels.
>>
>> Thanks,
>> Timur
> 

Reply via email to