i915: Implement inter-engine read-read optimisations

Tvrtko Ursulin Tue, 14 Apr 2015 06:58:46 -0700


On 04/07/2015 04:20 PM, Chris Wilson wrote:

Currently, we only track the last request globally across all engines.
This prevents us from issuing concurrent read requests on e.g. the RCS
and BCS engines (or more likely the render and media engines). Without
semaphores, we incur costly stalls as we synchronise between rings -
greatly impacting the current performance of Broadwell versus Haswell in
certain workloads (like video decode). With the introduction of
reference counted requests, it is much easier to track the last request
per ring, as well as the last global write request so that we can
optimise inter-engine read read requests (as well as better optimise
certain CPU waits).


v2: Fix inverted readonly condition for nonblocking waits.
v3: Handle non-continguous engine array after waits
v4: Rebase, tidy, rewrite ring list debugging
v5: Use obj->active as a bitfield, it looks cool
v6: Micro-optimise, mostly involving moving code around
v7: Fix retire-requests-upto for execlists (and multiple rq->ringbuf)
v8: Rebase

I am still slightly concerned with the sequential ring req waiting incombination with optimistic spinning, but other than that looks good to me:


Reviewed-by: Tvrtko Ursulin <[email protected]>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 18/70] drm/i915: Implement inter-engine read-read optimisations

Reply via email to