Hi,

thanks for the reply.

Am 02.09.25 um 17:58 schrieb Lyude Paul:
A solution down below + some other things you should be aware of :)

On Tue, 2025-09-02 at 16:16 +0200, Thomas Zimmermann wrote:
Hi

Am 02.09.25 um 15:27 schrieb Ville Syrjälä:
On Mon, Sep 01, 2025 at 01:06:58PM +0200, Thomas Zimmermann wrote:
The vblank timer simulates a vblank interrupt for hardware without
support. Rate-limits the display update frequency.

DRM drivers for hardware without vblank support apply display updates
ASAP. A vblank event informs DRM clients of the completed update.

Userspace compositors immediately schedule the next update, which
creates significant load on virtualization outputs. Display updates
are usually fast on virtualization outputs, as their framebuffers are
in regular system memory and there's no hardware vblank interrupt to
throttle the update rate.

The vblank timer is a HR timer that signals the vblank in software.
It limits the update frequency of a DRM driver similar to a hardware
vblank interrupt. The timer is not synchronized to the actual vblank
interval of the display.

The code has been adopted from vkms, which added the funtionality
in commit 3a0709928b17 ("drm/vkms: Add vblank events simulated by
hrtimers").
Does this suffer from the same deadlocks as well?
https://lore.kernel.org/all/20250510094757.4174662-1-zenghe...@huawei.com/
Thanks for this pointer. It has not been fixed yet, right? If vkms is
affected, this series probably is as well.

Best regards
Thomas

Hi! I am glad I saw this mail fly by the list because I actually have a fix
for this deadlock in my very incomplete vkms port to rust:

https://gitlab.freedesktop.org/lyudess/linux/-/blob/rvkms-slim/drivers/gpu/drm/rvkms/crtc.rs#L336

Basically what we do is keep track of when we're reporting a vblank event from
the hrtimer thread we use to emulate vblanks along with if we're trying to
stop the vblank timer:

https://gitlab.freedesktop.org/lyudess/linux/-/blob/rvkms-slim/drivers/gpu/drm/rvkms/crtc.rs#L336

Stopping the timer happens like this:

https://gitlab.freedesktop.org/lyudess/linux/-/blob/rvkms-slim/drivers/gpu/drm/rvkms/crtc.rs#L232

We grab the lock protecting cancelled and reporting, set cancelled, and then
only attempt to cancel the timer if it's not busy reporting. If it is, we
simply leave the timer be and rely on it noticing that cancelled has been set.

That code still contains race conditions AFAICT.

There are a number of ways to resolve this. I think the best would be to remove the softirq_expiry_lock from the hrtimer's cancel logic. The comment at [1] sounds like it is not really necessary to use that specific lock.

[1] https://elixir.bootlin.com/linux/v6.16.4/source/kernel/time/hrtimer.c#L1453

But IDK about the implications. Another idea is to push drm_handle_vblank() out of the vblank timer callback. But that might confuse DRM's vblank logic.

I ended up with a solution similar to yours, but much simpler. The cancel function signals to the timer to not restart itself. The start function waits for any dangling timer callback to stop itself. The code further uses hrtimer_try_to_cancel(), which is not affected by the deadlock.



The place where we actually unconditionally stop the timer is on
atomic_disable:

https://gitlab.freedesktop.org/lyudess/linux/-/blob/rvkms-slim/drivers/gpu/drm/rvkms/crtc.rs#L136

Which works fine since unlike vblank_disable(), we're outside of the various
vblank_time locks - and thus can wait on hrtimer_cancel() to complete without
worrying about a deadlock.

I'd be careful with that. You're modifying the vblank state behind the back of the vblank helpers. Probably works here, but can also backfire.


JFYI, there is one other fix here you might want too. When vkms disables the
vblank timer and then re-enables it later, I'm fairly certain it doesn't
actually schedule the timer for the correct time:

https://gitlab.freedesktop.org/lyudess/linux/-/blob/rvkms-slim/drivers/gpu/drm/vkms/vkms_crtc.c#L68

Notice that it's a relative timer, and in enable_vblank() we schedule the
timer to happen at now + frame_duration. The problem with this is that we
should be rescheduling the hrtimer for when the next vblank would happen in
relation to when the display pipe originally had vblank events enabled - not
in relation to the current time.

In other words: you could have vblanks enabled, disable them, and then
(frame_duration / 2) nanoseconds later re-enable the timer - meaning that
every vblank interrupt is now (frame_duration / 2) nanoseconds offset from
when the vblank interval should actually be occurring. I'm not sure if this
actually breaks anything though, but it certainly doesn't seem correct. To fix
this in rvkms, we keep a timestamp of when we originally enabled the pipe and
do some modulus fun to figure out the exact timestamp for the next vblank
interval:

https://gitlab.freedesktop.org/lyudess/linux/-/blob/rvkms-slim/drivers/gpu/drm/rvkms/crtc.rs#L168

We also use absolute timers instead of relative to really make sure things get
scheduled at just the right tie.

I've been wondering about this problem as well, but never found any issues with it. See [2] for how it's currently done in vkms. If the timer is still off, it takes the current time for the last vblank. And if the timer is on, it computes the previous vblank time. So the values should be fine.

[2] https://elixir.bootlin.com/linux/v6.17-rc4/source/drivers/gpu/drm/vkms/vkms_crtc.c#L83

On the problem in general: we have the get_scanout_position [3] callback, which would be the right thing to call here. The timeout could be computed from the returned values. For virtual displays it wouldn't matter much. But for hardware, the vblank timer could be synced to the real vblank if the hardware provides the necessary information.

[3] https://elixir.bootlin.com/linux/v6.17-rc4/source/include/drm/drm_modeset_helper_vtables.h#L489

Best regards
Thomas


(I will try to port these fixes over to vkms at some point unless someone else
gets to them first…)


--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)


Reply via email to