We dropped calling process_csb prior to handling direct submission in
order to avoid the nesting of spinlocks and lift process_csb() and the
majority of the tasklet out of irq-off. However, we do want to avoid
ksoftirqd latency in the fast path, so try and pull the interrupt-bh
local to direct submission if we can acquire the tasklet's lock.

v2: Document the read of pending[0] from outside the tasklet with

Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursu...@linux.intel.com>
 drivers/gpu/drm/i915/gt/intel_lrc.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
index f88d3b95c4e1..d49baade0986 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2891,6 +2891,13 @@ static void __submit_queue_imm(struct intel_engine_cs 
        if (reset_in_progress(execlists))
                return; /* defer until we restart the engine following reset */
+       /* Hopefully we clear execlists->pending[] to let us through */
+       if (READ_ONCE(execlists->pending[0]) &&
+           tasklet_trylock(&execlists->tasklet)) {
+               process_csb(engine);
+               tasklet_unlock(&execlists->tasklet);
+       }

