On Wed, Feb 22, 2017 at 04:53:35PM +0000, Tvrtko Ursulin wrote:
> 
> On 17/02/2017 15:51, Chris Wilson wrote:
> >@@ -4034,9 +4038,8 @@ __i915_request_irq_complete(const struct 
> >drm_i915_gem_request *req)
> >      * is woken.
> >      */
> >     if (engine->irq_seqno_barrier &&
> >-        rcu_access_pointer(engine->breadcrumbs.irq_seqno_bh) == current &&
> >         test_and_clear_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted)) {
> >-            struct task_struct *tsk;
> >+            unsigned long flags;
> >
> >             /* The ordering of irq_posted versus applying the barrier
> >              * is crucial. The clearing of the current irq_posted must
> >@@ -4058,17 +4061,17 @@ __i915_request_irq_complete(const struct 
> >drm_i915_gem_request *req)
> >              * the seqno before we believe it coherent since they see
> >              * irq_posted == false but we are still running).
> >              */
> >-            rcu_read_lock();
> >-            tsk = rcu_dereference(engine->breadcrumbs.irq_seqno_bh);
> >-            if (tsk && tsk != current)
> >+            spin_lock_irqsave(&engine->breadcrumbs.lock, flags);
> >+            if (engine->breadcrumbs.first_wait &&
> >+                engine->breadcrumbs.first_wait->tsk != current)
> >                     /* Note that if the bottom-half is changed as we
> >                      * are sending the wake-up, the new bottom-half will
> >                      * be woken by whomever made the change. We only have
> >                      * to worry about when we steal the irq-posted for
> >                      * ourself.
> >                      */
> >-                    wake_up_process(tsk);
> >-            rcu_read_unlock();
> >+                    wake_up_process(engine->breadcrumbs.first_wait->tsk);
> >+            spin_unlock_irqrestore(&engine->breadcrumbs.lock, flags);
> 
> Worth caching &engine->breadcrumbs maybe?

I'll ask gcc.

> >
> >             if (__i915_gem_request_completed(req, seqno))
> >                     return true;
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.c 
> >b/drivers/gpu/drm/i915/i915_gem_request.c
> >index e22eacec022d..2e7bdb0cf069 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.c
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.c
> >@@ -1083,6 +1083,7 @@ long i915_wait_request(struct drm_i915_gem_request 
> >*req,
> >     }
> >
> >     wait.tsk = current;
> >+    wait.request = req;
> 
> I guess this was the preemption prep tree already. If you decide to
> keep the intel_wait_init helper could add req to it.

Challenge being not all users of intel_wait_init have a request :)

> > restart:
> >     do {
> >diff --git a/drivers/gpu/drm/i915/i915_gem_request.h 
> >b/drivers/gpu/drm/i915/i915_gem_request.h
> >index 5f73d8c0a38a..0efee879df23 100644
> >--- a/drivers/gpu/drm/i915/i915_gem_request.h
> >+++ b/drivers/gpu/drm/i915/i915_gem_request.h
> >@@ -32,10 +32,12 @@
> >
> > struct drm_file;
> > struct drm_i915_gem_object;
> >+struct drm_i915_gem_request;
> >
> > struct intel_wait {
> >     struct rb_node node;
> >     struct task_struct *tsk;
> >+    struct drm_i915_gem_request *request;
> >     u32 seqno;
> > };
> >
> >diff --git a/drivers/gpu/drm/i915/i915_irq.c 
> >b/drivers/gpu/drm/i915/i915_irq.c
> >index 57fa1bf78a85..0c370c687c2a 100644
> >--- a/drivers/gpu/drm/i915/i915_irq.c
> >+++ b/drivers/gpu/drm/i915/i915_irq.c
> >@@ -1033,10 +1033,44 @@ static void ironlake_rps_change_irq_handler(struct 
> >drm_i915_private *dev_priv)
> >
> > static void notify_ring(struct intel_engine_cs *engine)
> > {
> >+    struct drm_i915_gem_request *rq = NULL;
> >+    struct intel_wait *wait;
> >+
> >+    if (!intel_engine_has_waiter(engine))
> >+            return;
> >+
> >+    trace_i915_gem_request_notify(engine);
> >     atomic_inc(&engine->irq_count);
> >     set_bit(ENGINE_IRQ_BREADCRUMB, &engine->irq_posted);
> >-    if (intel_engine_wakeup(engine))
> >-            trace_i915_gem_request_notify(engine);
> >+
> >+    rcu_read_lock();
> 
> Not sure this is required from an irq handler. But I don't think it
> harms either, so maybe has an usefulness in documenting things.

That's what I said! It's not, but it's cheap enough that I think it is
worth keeping to document the game being played with rq.

> Looks OK.
> 
> Last time I tested it was a definitive latency improvement but also
> a slight throughput regression on crazy microbenchmarks like
> gem_latency IIRC. But I think overall a good thing for more
> realistic workloads.

Yup. It's a big one for gem_exec_whisper which is latency sensitive (and
gem_exec_latency). I didn't see gem_latency (using tests/gem_latency)
suffer too much. To be precise, which gem_latency are you thinking of?
:) I presume ezbench -b gem:latency ?

I did look at trying to reduce the cost of the spinlocks (by trying to
share them between requests, or the engine->timeline and breadcrumbs)
but came away bruised and battered.

> I leave to you the ordering wrt/ the preemption prep series. I can
> apply my r-b at that point, not sure if it makes sense now?

It's in the queue, and I'll resend it once the earlier patches land so
that CI has a run over it, and to catch any more ideas.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to