Chris Wilson <[email protected]> writes:

> In what seems remarkably similar to the w/a required to not reload an
> idle context with HEAD==TAIL, it appears we must prevent the HW from
> switching to an idle context in ELSP[1], while simultaneously trying to
> preempt the HW to run another context and a continuation of the idle
> context (which is no longer idle).
>
> We can achieve this by preventing the context from completing while we
> reload a new ELSP (by applying ring_set_paused(1) across the whole of
> dequeue), except this eventually fails due to a lite-restore into a
> waiting semaphore does not generate an ACK. Instead, we try to avoid
> making the GPU do anything too challenging and not submit a new ELSP
> while the interrupts + CSB events appear to have fallen behind the
> completed contexts. We expect it to catch up shortly so we queue another
> tasklet execution and hope for the best.
>
> Closes: https://gitlab.freedesktop.org/drm/intel/issues/1501
> Signed-off-by: Chris Wilson <[email protected]>
> Cc: Tvrtko Ursulin <[email protected]>
> Cc: Mika Kuoppala <[email protected]>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 26 +++++++++++++++++++++++---
>  1 file changed, 23 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index b12355048501..5f17ece07858 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1915,11 +1915,26 @@ static void execlists_dequeue(struct intel_engine_cs 
> *engine)
>        * of trouble.
>        */
>       active = READ_ONCE(execlists->active);
> -     while ((last = *active) && i915_request_completed(last))
> -             active++;
>  
> -     if (last) {
> +     /*
> +      * In theory we can skip over completed contexts that have not
> +      * yet been processed by events (as those events are in flight):
> +      *
> +      * while ((last = *active) && i915_request_completed(last))
> +      *      active++;
> +      *
> +      * However, the GPU is cannot handle this as it will ultimately

s/is//

I applaud the straightforward nature of this compared to the pausing.
Albeit this seems to have a cost. 

But this should be quite rare event comparatively?

> +      * find itself trying to jump back into a context it has just
> +      * completed and barf.
> +      */
> +
> +     if ((last = *active)) {
>               if (need_preempt(engine, last, rb)) {
> +                     if (i915_request_completed(last)) {
> +                             tasklet_hi_schedule(&execlists->tasklet);
> +                             return;
> +                     }
> +

I was pondering of the lost tracing and if you can
work it backwards to this condition.

But I really hope this nails it,
Reviewed-by: Mika Kuoppala <[email protected]>

>                       ENGINE_TRACE(engine,
>                                    "preempting last=%llx:%lld, prio=%d, 
> hint=%d\n",
>                                    last->fence.context,
> @@ -1947,6 +1962,11 @@ static void execlists_dequeue(struct intel_engine_cs 
> *engine)
>                       last = NULL;
>               } else if (need_timeslice(engine, last) &&
>                          timer_expired(&engine->execlists.timer)) {
> +                     if (i915_request_completed(last)) {
> +                             tasklet_hi_schedule(&execlists->tasklet);
> +                             return;
> +                     }
> +
>                       ENGINE_TRACE(engine,
>                                    "expired last=%llx:%lld, prio=%d, 
> hint=%d\n",
>                                    last->fence.context,
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to