On 17/03/2022 07:23, Vivek Kasireddy wrote:
On platforms capable of allowing 8K (7680 x 4320) modes, pinning 2 or
more framebuffers/scanout buffers results in only one that is mappable/
fenceable. Therefore, pageflipping between these 2 FBs where only one
is mappable/fenceable creates latencies large enough to miss alternate
vblanks thereby producing less optimal framerate.

This mainly happens because when i915_gem_object_pin_to_display_plane()
is called to pin one of the FB objs, the associated vma is identified
as misplaced -- because there is no space for it in the aperture --
and therefore i915_vma_unbind() is called which unbinds and evicts it.
This misplaced vma gets subseqently pinned only when
i915_gem_object_ggtt_pin_ww() is called without PIN_MAPPABLE. This whole
thing results in a latency of ~10ms and happens every other repaint cycle.

Just out of curiosity - have you looked at where does this 10ms come from? Like is it simply clearing/writing PTEs so expensive, or there is more to it? Apologies if I asked this before..

Therefore, to fix this issue, we just ensure that the misplaced VMA
does not get evicted when we try to pin it with PIN_MAPPABLE -- by
returning early if the mappable/fenceable flag is not set.

Testcase:
Running Weston and weston-simple-egl on an Alderlake_S (ADLS) platform
with a 8K@60 mode results in only ~40 FPS (compared to ~59 FPS with
this patch). Since upstream Weston submits a frame ~7ms before the
next vblank, the latencies seen between atomic commit and flip event
are 7, 24 (7 + 16.66), 7, 24..... suggesting that it misses the
vblank every other frame.

Here is the ftrace snippet that shows the source of the ~10ms latency:
               i915_gem_object_pin_to_display_plane() {
0.102 us   |    i915_gem_object_set_cache_level();
                 i915_gem_object_ggtt_pin_ww() {
0.390 us   |      i915_vma_instance();
0.178 us   |      i915_vma_misplaced();
                   i915_vma_unbind() {
                   __i915_active_wait() {
0.082 us   |        i915_active_acquire_if_busy();
0.475 us   |      }
                   intel_runtime_pm_get() {
0.087 us   |        intel_runtime_pm_acquire();
0.259 us   |      }
                   __i915_active_wait() {
0.085 us   |        i915_active_acquire_if_busy();
0.240 us   |      }
                   __i915_vma_evict() {
                     ggtt_unbind_vma() {
                       gen8_ggtt_clear_range() {
10507.255 us |        }
10507.689 us |      }
10508.516 us |   }

Cc: Tvrtko Ursulin <[email protected]>
Signed-off-by: Vivek Kasireddy <[email protected]>
---
  drivers/gpu/drm/i915/i915_gem.c | 8 +++++++-
  1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 9747924cc57b..7307c5de1c58 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -939,8 +939,14 @@ i915_gem_object_ggtt_pin_ww(struct drm_i915_gem_object 
*obj,
                        if (i915_vma_is_pinned(vma) || i915_vma_is_active(vma))
                                return ERR_PTR(-ENOSPC);
+ /*
+                        * If this misplaced vma is too big (i.e, at-least
+                        * half the size of aperture) or just unmappable,
+                        * we would not be able to pin with PIN_MAPPABLE.
+                        */

I would be tempted to describe the ping-pong issue in the comment. In short would do it, but just because git blame on a line of code tends to fail to lead to the correct commit message after a while.

So maybe just say along the lines of "If the misplaced vma is too big ... or hasn't been pinned mappable before, we ignore the misplacement when PIN_NONBLOCK is set in order to avoid ping-pong of double (or more) -buffered frame buffers into the aperture and out on every vblank."

                        if (flags & PIN_MAPPABLE &&
-                           vma->fence_size > ggtt->mappable_end / 2)
+                           (vma->fence_size > ggtt->mappable_end / 2 ||
+                           !i915_vma_is_map_and_fenceable(vma)))
                                return ERR_PTR(-ENOSPC);
                }

With the expanded comment it looks good to me.

Reviewed-by: Tvrtko Ursulin <[email protected]>

+ Daniel if he wants to double check it.

Regards,

Tvrtko

Reply via email to