On 25/09/2019 13:51, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-09-25 13:41:10)
[+ Daniele, I think he might want to have a look at this.]

On 25/09/2019 11:01, Chris Wilson wrote:
With the introduction of ctx->engines[] we allow multiple logical
contexts to be used on the same engine (e.g. with virtual engines). Each
logical context requires a unique tag in order for context-switching to
occur correctly between them.

We only need to keep a unique tag for the active lifetime of the
context, and for as long as we need to identify that context. The HW
uses the tag to determine if it should use a lite-restore (why not the
LRCA?) and passes the tag back for various status identifies. The only
status we need to track is for OA, so when using perf, we assign the
specific context a unique tag.

Fixes: 976b55f0e1db ("drm/i915: Allow a context to define its set of engines")

Fixes? Why?

The above paragraph explains that as we give two distinct contexts the
same tag then it will only perform (according to bspec) a lite-restore.

Right I see. But not with virtual engines per se, but even if someone created an engine map with two same engines in it. Virtual engines are then a super set of this problem.

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h 
b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index da4b45aa84b1..7bce176e696c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -296,10 +296,12 @@ struct intel_engine_cs {
       u8 uabi_class;
       u8 uabi_instance;
+ u32 uabi_capabilities;
       u32 context_size;
       u32 mmio_base;
- u32 uabi_capabilities;
+     unsigned int context_tag;
+#define NUM_CONTEXT_TAG (256)

AFAICT this define now defines how deep ELSP we can have before we start
getting hard to debug problems. GuC angle needs considering here.

The guc is immaterial here, since they use their own mechanism. It just
needs to be a value larger than 2 * EXECLIST_MAX_PORTS and less then
BIT(10).

I don't know how GuC (will) work so hopefully Daniele can confirm or deny this won't be a problem there.

@@ -978,6 +965,15 @@ __execlists_schedule_in(struct i915_request *rq)
intel_context_get(ce); + if (ce->tag) {
+             ce->lrc_desc |= (u64)ce->tag << 32;
+     } else {
+             ce->lrc_desc &= ~GENMASK_ULL(47, 37);
+             ce->lrc_desc |=
+                     (u64)(engine->context_tag++ % NUM_CONTEXT_TAG) <<
+                     GEN11_SW_CTX_ID_SHIFT;
+     }

So hw id is valid only while context is in ELSP and it changes with
every submission except in the OA case?

Aye. OA being a pita.
Shifts and masks look dodgy. For >= gen11 the current code has the hw_id
in [37, 42], otherwise [32, 52]. This patch does not seem to handle the
differences between gens.

Because the values we supply do not matter (they are cookies), just the
lifetime.

Okay so you are choosing to not use some of the available bits since we don't need all 20 (pre gen11), or even 10 (gen11+). This definitely deserves a comment. And another assert with NUM_CONTEXT_TAG relating to available bits.


+
       intel_gt_pm_get(engine->gt);
       execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
       intel_engine_context_in(engine);
@@ -2224,7 +2220,6 @@ static void execlists_context_unpin(struct intel_context 
*ce)
       check_redzone((void *)ce->lrc_reg_state - LRC_STATE_PN * PAGE_SIZE,
                     ce->engine);
- i915_gem_context_unpin_hw_id(ce->gem_context);
       i915_gem_object_unpin_map(ce->state->obj);
       intel_ring_reset(ce->ring, ce->ring->tail);
   }
@@ -2274,18 +2269,12 @@ __execlists_context_pin(struct intel_context *ce,
               goto unpin_active;
       }
- ret = i915_gem_context_pin_hw_id(ce->gem_context);
-     if (ret)
-             goto unpin_map;
-
       ce->lrc_desc = lrc_descriptor(ce, engine);
       ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
       __execlists_update_reg_state(ce, engine);
return 0; -unpin_map:
-     i915_gem_object_unpin_map(ce->state->obj);
   unpin_active:
       intel_context_active_release(ce);
   err:
diff --git a/drivers/gpu/drm/i915/gvt/kvmgt.c b/drivers/gpu/drm/i915/gvt/kvmgt.c
index 343d79c1cb7e..04a5a0d90823 100644
--- a/drivers/gpu/drm/i915/gvt/kvmgt.c
+++ b/drivers/gpu/drm/i915/gvt/kvmgt.c
@@ -1564,27 +1564,10 @@ vgpu_id_show(struct device *dev, struct 
device_attribute *attr,
       return sprintf(buf, "\n");
   }
-static ssize_t
-hw_id_show(struct device *dev, struct device_attribute *attr,
-        char *buf)
-{
-     struct mdev_device *mdev = mdev_from_dev(dev);
-
-     if (mdev) {
-             struct intel_vgpu *vgpu = (struct intel_vgpu *)
-                     mdev_get_drvdata(mdev);
-             return sprintf(buf, "%u\n",
-                            vgpu->submission.shadow[0]->gem_context->hw_id);
-     }
-     return sprintf(buf, "\n");
-}
-
   static DEVICE_ATTR_RO(vgpu_id);
-static DEVICE_ATTR_RO(hw_id);
static struct attribute *intel_vgpu_attrs[] = {
       &dev_attr_vgpu_id.attr,
-     &dev_attr_hw_id.attr,
       NULL
   };
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index fd41505d52ec..8caaa446490f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1504,9 +1504,6 @@ static int i915_context_status(struct seq_file *m, void 
*unused)
               struct intel_context *ce;
seq_puts(m, "HW context ");
-             if (!list_empty(&ctx->hw_id_link))
-                     seq_printf(m, "%x [pin %u]", ctx->hw_id,
-                                atomic_read(&ctx->hw_id_pin_count));
               if (ctx->pid) {
                       struct task_struct *task;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a86d28bda6dd..47239df653f2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -471,9 +471,9 @@ static void error_print_context(struct 
drm_i915_error_state_buf *m,
                               const char *header,
                               const struct drm_i915_error_context *ctx)
   {
-     err_printf(m, "%s%s[%d] hw_id %d, prio %d, guilty %d active %d\n",
-                header, ctx->comm, ctx->pid, ctx->hw_id,
-                ctx->sched_attr.priority, ctx->guilty, ctx->active);
+     err_printf(m, "%s%s[%d] prio %d, guilty %d active %d\n",
+                header, ctx->comm, ctx->pid, ctx->sched_attr.priority,
+                ctx->guilty, ctx->active);
   }
static void error_print_engine(struct drm_i915_error_state_buf *m,
@@ -1262,7 +1262,6 @@ static bool record_context(struct drm_i915_error_context 
*e,
               rcu_read_unlock();
       }
- e->hw_id = ctx->hw_id;
       e->sched_attr = ctx->sched;
       e->guilty = atomic_read(&ctx->guilty_count);
       e->active = atomic_read(&ctx->active_count);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h 
b/drivers/gpu/drm/i915/i915_gpu_error.h
index caaed5093d95..4dc36d6ee3a2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -117,7 +117,6 @@ struct i915_gpu_state {
               struct drm_i915_error_context {
                       char comm[TASK_COMM_LEN];
                       pid_t pid;
-                     u32 hw_id;
                       int active;
                       int guilty;
                       struct i915_sched_attr sched_attr;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 80055501eccb..d36ba248d438 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1283,22 +1283,15 @@ static int oa_get_render_ctx_id(struct i915_perf_stream 
*stream)
               } else {
                       stream->specific_ctx_id_mask =
                               (1U << GEN8_CTX_ID_WIDTH) - 1;
-                     stream->specific_ctx_id =
-                             upper_32_bits(ce->lrc_desc);
-                     stream->specific_ctx_id &=
-                             stream->specific_ctx_id_mask;
+                     stream->specific_ctx_id = stream->specific_ctx_id_mask;
               }
               break;
case 11:
       case 12: {
               stream->specific_ctx_id_mask =
-                     ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << 
(GEN11_SW_CTX_ID_SHIFT - 32) |
-                     ((1U << GEN11_ENGINE_INSTANCE_WIDTH) - 1) << 
(GEN11_ENGINE_INSTANCE_SHIFT - 32) |
-                     ((1 << GEN11_ENGINE_CLASS_WIDTH) - 1) << 
(GEN11_ENGINE_CLASS_SHIFT - 32);
-             stream->specific_ctx_id = upper_32_bits(ce->lrc_desc);
-             stream->specific_ctx_id &=
-                     stream->specific_ctx_id_mask;
+                     ((1U << GEN11_SW_CTX_ID_WIDTH) - 1) << 
(GEN11_SW_CTX_ID_SHIFT - 32);
+             stream->specific_ctx_id = stream->specific_ctx_id_mask;
               break;
       }
@@ -1306,6 +1299,8 @@ static int oa_get_render_ctx_id(struct i915_perf_stream *stream)
               MISSING_CASE(INTEL_GEN(i915));
       }
+ ce->tag = stream->specific_ctx_id_mask;
+
       DRM_DEBUG_DRIVER("filtering on ctx_id=0x%x ctx_id_mask=0x%x\n",
                        stream->specific_ctx_id,
                        stream->specific_ctx_id_mask);
@@ -1324,12 +1319,14 @@ static void oa_put_render_ctx_id(struct 
i915_perf_stream *stream)
   {
       struct intel_context *ce;
- stream->specific_ctx_id = INVALID_CTX_ID;
-     stream->specific_ctx_id_mask = 0;
-
       ce = fetch_and_zero(&stream->pinned_ctx);
-     if (ce)
+     if (ce) {
+             ce->tag = 0;
               intel_context_unpin(ce);
+     }
+
+     stream->specific_ctx_id = INVALID_CTX_ID;
+     stream->specific_ctx_id_mask = 0;

OA hunks looks dodgy as well. ce->tag is set to
stream->specific_ctx_id_mask while I think it should be id. Furthermore
id is assigned from mask which is fixed and not unique for different
contexts.

It is singular in the entire system. The mask is a guaranteed unique
value.

What is singular? The OA context and it's id? I assumed what needs to be done is fix hw_id for _all_ contexts while OA is sampling. But on a more detailed look this assumption does not seem to hold in code. Okay, I declare to not know how OA works well enough.


diff --git a/drivers/gpu/drm/i915/selftests/i915_vma.c 
b/drivers/gpu/drm/i915/selftests/i915_vma.c
index 1c9db08f7c28..ac1ff558eb90 100644
--- a/drivers/gpu/drm/i915/selftests/i915_vma.c
+++ b/drivers/gpu/drm/i915/selftests/i915_vma.c
@@ -170,7 +170,7 @@ static int igt_vma_create(void *arg)
               }
nc = 0;
-             for_each_prime_number(num_ctx, MAX_CONTEXT_HW_ID) {
+             for_each_prime_number(num_ctx, 2 * NUM_CONTEXT_TAG) {
                       for (; nc < num_ctx; nc++) {
                               ctx = mock_context(i915, "mock");
                               if (!ctx)


Unless I am missing something I see this patch as simplification and not
a fix. If we can get away with it in the GuC world then it's quite a big
simplification so it's fine by me.

Bspec says that for lrc_desc with matching tags (at least for gen8-gen11
I believe), it performs a lite restore (regardless of lrca). Ergo it is
quite a major fix.

Yes I see that now, my bad. This one will not lend itself well to backporting.. But I don't think there is a lighter-weight solution. Moving hw_id to intel_context would also end up large-ish I think. Because of the locking requirement etc.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to