On 10.04.2018 23:45, Cyr, Aric wrote:
For video games we have a similar situation where a frame is rendered
for a certain world time and in the ideal case we would actually
display the frame at this world time.
That seems like it would be a poorly written game that flips like
that, unless they are explicitly trying to throttle the framerate for
some reason. When a game presents a completed frame, they’d like
that to happen as soon as possible.
What you're describing is what most games have been doing traditionally.
Croteam's research shows that this results in micro-stuttering, because
frames may be presented too early. To avoid that, they want to
explicitly time each presentation as described by Christian.
Yes, I agree completely. However that's only truly relevant for fixed
refreshed rate displays.
No, it also affects variable refresh; possibly even more in some cases,
because the presentation time is less predictable.
Yes, and that's why you don't want to do it when you have variable refresh.
The hardware in the monitor and GPU will do it for you,
so why bother?
I think Michel's point is that the monitor and GPU hardware *cannot*
really do this, because there's synchronization with audio to take into
account, which the GPU or monitor don't know about.
How does it work fine today given that all kernel seems to know is 'current' or
Presumably the applications somehow schedule all this just fine.
If this works without variable refresh for 60Hz, will it not work for a fixed-rate
"48Hz" monitor (assuming a 24Hz video)?
You're right. I guess a better way to state the point is that it
*doesn't* really work today with fixed refresh, but if we're going to
introduce a new API, then why not do so in a way that can fix these
additional problems as well?
Also, as I wrote separately, there's the case of synchronizing multiple
For multimonitor to work with VRR, they'll have to be timing and flip
This is impossible for an application to manage, it needs driver/HW control or
you end up with one display flipping before the other and it looks terrible.
And definitely forget about multiGPU without professional workstation-type
support needed to sync the displays across adapters.
I'm not a display expert, but I find it hard to believe that it's that
difficult. Perhaps you can help us understand?
Say you have a multi-GPU system, and each GPU has multiple displays
attached, and a single application is driving them all. The application
queues flips for all displays with the same target_present_time_ns
attribute. Starting at some time T, the application simply asks for the
same present time T + i * 16666667 (or whatever) for frame i from all
Of course it's to be expected that some (or all) of the displays will
not be able to hit the target time on the first bunch of flips due to
hardware limitations, but as long as the range of supported frame times
is wide enough, I'd expect all of them to drift towards presenting at
the correct time eventually, even across multiple GPUs, with this simple
Why would that not work to sync up all displays almost perfectly?
Are there any real problems with exposing an absolute target present time?
Realistically, how far into the future are you requesting a presentation time?
Won't it almost always be something like current_time+1000/video_frame_rate?
If so, why not just tell the driver to set 1000/video_frame_rate and have the
GPU/monitor create nicely spaced VSYNCs for you that match the source content?
In fact, you probably wouldn't even need to change your video player at all,
other than having it pass the target_frame_duration_ns. You could consider
this a 'hint' as you suggested, since it's cannot be guaranteed in cases your
driver or HW doesn't support variable refresh. If the target_frame_duration_ns
hint is supported/applied, then the video app should have nothing extra to do
that it wouldn't already do for any arbitrary fixed-refresh rate display. If
not supported (say the drm_atomic_check fails with -EINVAL or something), the
video app falls can stop requesting a fixed target_frame_duration_ns.
A fundamental problem I have with a target present time though is how to accommodate
present times that are larger than one VSYNC time? If my monitor has a 40Hz-60Hz
variable refresh, it's easy to translate "my content is 24Hz, repeat this next frame
an integer multiple number of times so that it lands within the monitor range".
Driver fixes display to an even 48Hz and everything good (no worse than a 30Hz clip on a
traditional 60Hz display anyways). This frame-doubling is all hardware based and doesn't
require any polling.
Now if you change that to "show my content in at least X nanoseconds" it can work on all
displays, but the intent of the app is gone and driver/GPU/display cannot optimize. For example,
the HDMI VRR spec defines a "CinemaVRR" mode where target refresh rate error is accounted
for based on 0.1% deviation from requested and the v_total lines are incremented/decremented to
compensate. If we don't know the target rate, we will not be able to comply to this industry
Okay, that's interesting. Does this mean that the display driver still
programs a refresh rate to some hardware register?
What if you want to initiate some CPU-controlled drift, i.e. you know
you're targeting 2*24Hz, but you'd like to shift all flip times to be X
ms later? Can you program hardware for that, and how does it work? Do
have you twiddle the refresh rate, or can the hardware do it natively?
How about what I wrote in an earlier mail of having attributes:
- hint_frame_time_ns (optional)
... and if a video player set both, the driver could still do the
optimizations you've explained?
Also, how would you manage an absolute target present time in kernel? I guess app and
driver need to use a common system clock or tick count, but when would you know to 'wake
up' and execute the flip? If you wait for VSYNC then you'll always timeout out on
v_total_max (i.e. minimum refresh rate), check your time and see "yup, need to
present now" and then flip. Now your monitor just jumped from lowest refresh rate
to something else which can cause other problems. If you use some timer, then you're
burning needless power polling some counter and still wouldn't have the same accuracy you
could achieve with a fixed duration.
For the clock, we just have to specify which one to take. I believe
CLOCK_MONOTONIC makes the most sense for this kind of thing.
For your other questions, I'm afraid I just don't know enough about
modern display hardware to give a really good answer, but with my naive
understanding I would imagine something like the following:
1. When the atomic commit happens, the driver twiddles with the display
timings to get the start of scanout for the next frame as close as
possible to the specified target present time (I assume this is what
v_total_max is about?)
2. The kernel then schedules a timer for the time when the display
hardware is finished scanning out the previous frame and starts vblank.
3. In the handler for that timer, the kernel checks whether any fence
associated to the new frame's surface has signaled. If yes, it changes
the display hardware's framebuffer pointer to the new frame. Otherwise,
it atomically registers for the handler to be run again when the fence
3b. The handler should check if vblank has already ended (either due to
extreme CPU overload or because the fence was signaled too late).
Actually, that last point makes me wonder how the case of "present ASAP"
is actually implemented in hardware.
But again, all this is just from my naive understanding of the display
amd-gfx mailing list