On Tue, Aug 11, 2015 at 04:59:14PM +0300, [email protected] wrote:
> From: Ville Syrjälä <[email protected]>
> 
> Currently we don't clflush on pin_to_display if the bo is already
> UC/WT and is not in the CPU write domain. This causes problems with
> pwrite since pwrite doesn't change the write domain, and it avoids
> clflushing on UC/WT buffers on LLC platforms unless the buffer is
> currently being scanned out.
> 
> Fix the problem by marking the cache dirty and adjusting
> i915_gem_object_set_cache_level() to clflush when the cache is dirty
> even if the cache_level doesn't change.
> 
> My last attempt [1] at fixing this via write domain frobbing was shot
> down, but now with the cache_dirty flag we can do things in a nicer way.
> 
> [1] http://lists.freedesktop.org/archives/intel-gfx/2014-November/055390.html
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=86422
> Testcase: igt/kms_pwrite_crc
> Signed-off-by: Ville Syrjälä <[email protected]>
> ---
>  drivers/gpu/drm/i915/i915_gem.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 73293b4..73eff2e 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1005,12 +1005,15 @@ out:
>               if (!needs_clflush_after &&
>                   obj->base.write_domain != I915_GEM_DOMAIN_CPU) {
>                       if (i915_gem_clflush_object(obj, obj->pin_display))
> -                             i915_gem_chipset_flush(dev);
> +                             needs_clflush_after = true;
>               }
>       }
>  
>       if (needs_clflush_after)
>               i915_gem_chipset_flush(dev);
> +     else if (obj->cache_level == I915_CACHE_NONE ||
> +              obj->cache_level == I915_CACHE_WT)
> +             obj->cache_dirty = true;
>  
>       intel_fb_obj_flush(obj, false, ORIGIN_CPU);
>       return ret;

Ok, this seems reasonable.

> @@ -3639,10 +3642,10 @@ int i915_gem_object_set_cache_level(struct 
> drm_i915_gem_object *obj,
>  {
>       struct drm_device *dev = obj->base.dev;
>       struct i915_vma *vma, *next;
> -     int ret;
> +     int ret = 0;
>  
>       if (obj->cache_level == cache_level)
> -             return 0;
> +             goto out;
>  
>       if (i915_gem_obj_is_pinned(obj)) {
>               DRM_DEBUG("can not change the cache level of pinned objects\n");
> @@ -3687,6 +3690,7 @@ int i915_gem_object_set_cache_level(struct 
> drm_i915_gem_object *obj,
>               vma->node.color = cache_level;
>       obj->cache_level = cache_level;
>  
> +out:
>       if (obj->cache_dirty &&
>           obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
>           cpu_write_needs_clflush(obj)) {

This we can do better (and I know I am guilty of the original sin). It
just feels a little too loose that we expect pin-to-display-plane should
always call set-cache-level (yes, it has to, but it feels like we are
putting pin-to-display-plane specifics into set-cache-level).

I think this chunk is much more understandable if we did:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5e7fcf77e57a..fc6bcc19cca1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3656,13 +3656,6 @@ int i915_gem_object_set_cache_level(struct 
drm_i915_gem_object *obj,
                vma->node.color = cache_level;
        obj->cache_level = cache_level;
 
-       if (obj->cache_dirty &&
-           obj->base.write_domain != I915_GEM_DOMAIN_CPU &&
-           cpu_write_needs_clflush(obj)) {
-               if (i915_gem_clflush_object(obj, true))
-                       i915_gem_chipset_flush(obj->base.dev);
-       }
-
        return 0;
 }
 
@@ -3795,6 +3788,10 @@ i915_gem_object_pin_to_display_plane(struct 
drm_i915_gem_object *obj,
        WARN_ON(obj->pin_display > vma->pin_count);
 
        i915_gem_object_flush_cpu_write_domain(obj);
+       if (obj->cache_dirty) {
+               if (i915_gem_clflush_object(obj, true))
+                       i915_gem_chipset_flush(obj->base.dev);
+       }
 
        /* It should now be out of any other write domains, and we can update
         * the domain values for our changes.

Maybe even

/* Whilst the object was away from the scanout we may have been eliding the 
coherent
 * writes into the CPU cache. However, the moment it has to be read by the 
display
 * engine, those writes into the CPU cache become inaccessible and so we have 
to 
 * clflush them out to main memory. We track elided flushes with 
obj->cache_dirty
 * and hope they are rare.
 */


The other user of set-cache-level (set_caching_ioctl) should not care
about the clflush side-effect and the clflush elision should work just
fine instead.

Either way,
Reviewed-by: Chris Wilson <[email protected]>
but I'd prefer a v2 with the comments :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to