On 21/05/2019 18:17, Chris Wilson wrote:
Quoting Lionel Landwerlin (2019-05-21 17:50:30)
On 21/05/2019 17:36, Chris Wilson wrote:
Quoting Lionel Landwerlin (2019-05-21 15:08:52)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f263a8374273..2ad95977f7a8 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2085,7 +2085,7 @@ static int gen9_emit_bb_start(struct i915_request *rq,
          if (IS_ERR(cs))
                  return PTR_ERR(cs);
- *cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
+       *cs++ = MI_ARB_ON_OFF | rq->hw_context->arb_enable;
My prediction is that this will result in this context being reset due
to preemption timeouts and the context under profile being banned.
Note that preemption timeouts will be the primary means for hang
detection for endless batches. -Chris
Thanks,

One question : how is that dealt with with compute workloads at the moment?
I though those where still not fully preemptable.
Not blocking is the condition under which they get to use endless...
compute jobs are preemptible from gen9 afaik, gen8 was problematic and so
disabled.
I need to rework this with a more "software" approach holding on preemption.
Adding a condition in intel_lrc.c need_preempt() looks like the right
direction?
Even less if that is our means of hangcheck.
-Chris


Can we differentiate between a hangcheck & a high priority request?

If I remember correctly, we can set the hangcheck timeout somewhere in /sys.

I think it's fine to ban the context doing a perf query if it's taking too long.


If a user runs into that scenario we can tell them to increase the timeout.


-Lionel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to