On Mon, Dec 07, 2020 at 07:38:10PM +0000, Chris Wilson wrote:
> The issue with stale virtual breadcrumbs remain. Now we have the problem
> that if the irq-signaler is still referencing the stale breadcrumb as we
> transfer it to a new sibling, the list becomes spaghetti. This is a very
> small window, but that doesn't stop it being hit infrequently. To
> prevent the lists being tangled (the iterator starting on one engine's
> b->signalers but walking onto another list), always decouple the virtual
> breadcrumb on schedule-out and make sure that the walker has stepped out
> of the lists.
> 
> Signed-off-by: Chris Wilson <[email protected]>

Makes sense to me.
Reviewed-by: Matthew Brost <[email protected]>

> ---
>  drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  5 +++--
>  drivers/gpu/drm/i915/gt/intel_lrc.c         | 15 +++++++++++++++
>  2 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c 
> b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index 00918300f53f..63900edbde88 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -454,15 +454,16 @@ void i915_request_cancel_breadcrumb(struct i915_request 
> *rq)
>  {
>       struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs;
>       struct intel_context *ce = rq->context;
> +     unsigned long flags;
>       bool release;
>  
>       if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
>               return;
>  
> -     spin_lock(&ce->signal_lock);
> +     spin_lock_irqsave(&ce->signal_lock, flags);
>       list_del_rcu(&rq->signal_link);
>       release = remove_signaling_context(b, ce);
> -     spin_unlock(&ce->signal_lock);
> +     spin_unlock_irqrestore(&ce->signal_lock, flags);
>       if (release)
>               intel_context_put(ce);
>  
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 0b7e86240f3b..b3db16b2a5a4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1388,6 +1388,21 @@ static inline void execlists_schedule_in(struct 
> i915_request *rq, int idx)
>  static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
>  {
>       struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> +     struct intel_engine_cs *engine = rq->engine;
> +
> +     /* Flush concurrent rcu iterators in signal_irq_work */
> +     if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) {
> +             /*
> +              * After this point, the rq may be transferred to a new
> +              * sibling, so before we clear ce->inflight make sure that
> +              * the context has been removed from the b->signalers and
> +              * furthermore we need to make sure that the concurrent
> +              * iterator in signal_irq_work is no longer following
> +              * ce->signal_link.
> +              */
> +             i915_request_cancel_breadcrumb(rq);
> +             irq_work_sync(&engine->breadcrumbs->irq_work);
> +     }
>  
>       if (READ_ONCE(ve->request))
>               tasklet_hi_schedule(&ve->base.execlists.tasklet);
> -- 
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to