On Tue, Oct 18, 2016 at 09:01:30AM +0100, Tvrtko Ursulin wrote:
> 
> On 17/10/2016 15:10, Chris Wilson wrote:
> >When handling execbuf relocations, we play a delicate dance with
> >pagefault. We first try to access the user pages underneath our
> >struct_mutex. However, if those pages were inside a GEM object, we may
> >trigger a pagefault and deadlock as i915_gem_fault() tries to
> >recursively acquire struct_mutex. Instead, we choose to disable
> >pagefaulting around the copy_from_user whilst inside the struct_mutex
> >and handle the EFAULT by falling back to a copy outside the
> >struct_mutex.
> >
> >We however presumed that disabling pagefaults would be expensive. It is
> >just an operation on the local current task. Cheap enough that we can
> >restrict the disable/enable to the critical section around the copy, and
> >so avoid having to handle the atomic sections within the relocation
> >handling itself.
> >
> >Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> >Cc: Daniel Vetter <daniel.vet...@ffwll.ch>
> >Cc: Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 64 
> > +++++++++++++-----------------
> >  1 file changed, 28 insertions(+), 36 deletions(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
> >b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >index 1d02e74ce62d..22342ad0e07f 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >@@ -551,20 +551,6 @@ repeat:
> >     return 0;
> >  }
> >-static bool object_is_idle(struct drm_i915_gem_object *obj)
> >-{
> >-    unsigned long active = i915_gem_object_get_active(obj);
> >-    int idx;
> >-
> >-    for_each_active(active, idx) {
> >-            if (!i915_gem_active_is_idle(&obj->last_read[idx],
> >-                                         &obj->base.dev->struct_mutex))
> >-                    return false;
> >-    }
> >-
> >-    return true;
> >-}
> >-
> >  static int
> >  i915_gem_execbuffer_relocate_entry(struct drm_i915_gem_object *obj,
> >                                struct eb_vmas *eb,
> >@@ -648,10 +634,6 @@ i915_gem_execbuffer_relocate_entry(struct 
> >drm_i915_gem_object *obj,
> >             return -EINVAL;
> >     }
> >-    /* We can't wait for rendering with pagefaults disabled */
> >-    if (pagefault_disabled() && !object_is_idle(obj))
> >-            return -EFAULT;
> >-
> >     ret = relocate_entry(obj, reloc, cache, target_offset);
> >     if (ret)
> >             return ret;
> >@@ -678,12 +660,23 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> >     remain = entry->relocation_count;
> >     while (remain) {
> >             struct drm_i915_gem_relocation_entry *r = stack_reloc;
> >-            int count = remain;
> >-            if (count > ARRAY_SIZE(stack_reloc))
> >-                    count = ARRAY_SIZE(stack_reloc);
> >+            unsigned long unwritten;
> >+            unsigned int count;
> >+
> >+            count = min_t(unsigned int, remain, ARRAY_SIZE(stack_reloc));
> >             remain -= count;
> >-            if (__copy_from_user_inatomic(r, user_relocs, 
> >count*sizeof(r[0]))) {
> >+            /* This is the fast path and we cannot handle a pagefault
> >+             * whilst holding the struct mutex lest the user pass in the
> >+             * relocations contained within a mmaped bo. For in such a case
> >+             * we, the page fault handler would call i915_gem_fault() and
> >+             * we would try to acquire the struct mutex again. Obviously
> >+             * this is bad and so lockdep complains vehemently.
> >+             */
> >+            pagefault_disable();
> >+            unwritten = __copy_from_user_inatomic(r, user_relocs, 
> >count*sizeof(r[0]));
> >+            pagefault_enable();
> >+            if (unwritten) {
> >                     ret = -EFAULT;
> >                     goto out;
> >             }
> >@@ -695,11 +688,19 @@ i915_gem_execbuffer_relocate_vma(struct i915_vma *vma,
> >                     if (ret)
> >                             goto out;
> >-                    if (r->presumed_offset != offset &&
> >-                        __put_user(r->presumed_offset,
> >-                                   &user_relocs->presumed_offset)) {
> >-                            ret = -EFAULT;
> >-                            goto out;
> >+                    if (r->presumed_offset != offset) {
> >+                            /* Copying back to the user is allowed to fail.
> >+                             * The information passed back is a hint as
> >+                             * to the final location. If the copy_to_user
> >+                             * fails after a successful copy_from_user,
> >+                             * it must be a readonly location and so
> >+                             * we presume the user knows what they are
> >+                             * doing!
> >+                             */
> >+                            pagefault_disable();
> >+                            __put_user(r->presumed_offset,
> >+                                       &user_relocs->presumed_offset);
> >+                            pagefault_enable();
> 
> Why is a good idea to ignore potential errors here?

Wrong question: why did we think it a good idea to ignore success here?

(a) it is safe to do so, and I can legitimately setup userspace to use
this
(b) reporting an error after we have committed the change is broken
anyway.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to