Re: [Mesa-dev] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines

2019-03-26 Thread Chris Wilson
Quoting Kenneth Graunke (2019-03-26 17:01:57)
> On Tuesday, March 26, 2019 12:16:20 AM PDT Chris Wilson wrote:
> > Quoting Kenneth Graunke (2019-03-26 05:52:10)
> > > On Monday, March 25, 2019 3:58:59 AM PDT Chris Wilson wrote:
> > > > iris currently uses two distinct GEM contexts to have distinct logical
> > > > HW contexts for the compute and render pipelines. However, using two
> > > > distinct GEM contexts implies that they are distinct timelines, yet as
> > > > they are a single GL context that implies they belong to a single
> > > > timeline from the client perspective. Currently, fences are occasionally
> > > > inserted to order the two timelines. Using 2 GEM contexts, also implies
> > > > that we keep 2 ppGTT for identical buffer state. If we can create a
> > > > single GEM context, with the right capabilities, we can have a single
> > > > VM, a single timeline, but 2 logical HW contexts for the 2 pipelines.
> > > > 
> > > > This is allowed through the new context interface that allows VM to be
> > > > shared, timelines to be specified, and for the logical contexts to be
> > > > constructed as the user desires.
> > > > 
> > > > Cc: Joonas Lahtinen 
> > > > Cc: Kenneth Graunke 
> > > > ---
> > > >  src/gallium/drivers/iris/iris_batch.c   | 16 ++-
> > > >  src/gallium/drivers/iris/iris_batch.h   |  5 +--
> > > >  src/gallium/drivers/iris/iris_context.c | 56 -
> > > >  3 files changed, 60 insertions(+), 17 deletions(-)
> > > 
> > > Hi Chris,
> > > 
> > > I don't think that I want the single timeline option.  It seems like
> > > we've been moving away from implicit sync for a long time, and the
> > > explicit sync code we have is pretty straightforward and seems to do
> > > the trick.  Jason and I also chatted briefly, and we don't necessarily
> > > want to a strict submission-order between render/compute.
> > 
> > I disagree if you think this means more implicit sync. It is setting up
> > the GEM context to an exact match of the GL context, by _explicit_
> > control of the timeline. Then the fences you do export from inside the
> > GL context do not need to be faked to be a composite of the pair of
> > contexts. You still have explicit fences, and you have explicit control
> > over the definition of their timeline.
> 
> With regard to multiple GL contexts, yes, everything remains explicit.
> But having 2-3 separate timelines within a GL context allows us to
> reorder work behind GL's back, which is all the rage these days for
> performance.  Tilers do it all the time.  Position-only bucketing may
> require it.  I'd really like to start treating render and compute as
> distinct asynchronous queues.  At the very least, experimenting with
> that and not tying my hands to a particular behavior.

That's a reasonable argument. If you want to try and keep the GL
semantics intact while playing with ordering underneath, have fun!

The only problem I forsee if there is any observable through which the
pipelines can determine their ordering / concurrency (sampling a common
buffer or clock) that might construe a violation.
 
> There may be some use for single timeline, though.  Attaching images as
> compute shader inputs may require CCS/HiZ resolves, which have to happen
> on the RCS.  Right now, I do those on IRIS_BATCH_RENDER, which mean that
> it backs up behind any queued render work.  Ideally, I'd do those on a
> third context, which could be tied to the compute timeline, so the
> resolves and the compute job can both execute ahead of queued rendering,
> but still back to back.

I have an inkling that timelines should be first class for userspace to
control exactly. But I have not seen anything close to a use case to
justify that (yet). And by the time a usecase should arise, we will
probably be onto the next shiny. That's the problem with cloudy crystal
balls.

> > > Separating the VMA from the context state image seems like absolutely
> > > the right thing to do - as you said, they're separate in hardware,
> > > and no real reason to tie it together.  I would be in favor of new
> > > uABI for that.
> > > 
> > > I don't think there will be much overhead reduction from sharing the
> > > VMA here though.  It's very plausible that the compositor might want
> > > to run between render and compute batches, at which point we end up
> > > doing page directory loads anyway.  I have also heard rumors about bit
> > > 47 becoming magical at some point which may prohibit us from sharing...
> > 
> > Yeah, but that doesn't actually affect the context setup, just how you
> > decide to use it in end. And by that point, you'll be forced into using
> > this new uABI anyway or something entirely different :-p
> 
> Looking into this a bit more, I think we're actually OK.  I thought I
> might need to have distinct addresses for render and compute - at which
> point nearly every address would differ in terms of bit 47 - but it
> looks like the correct answer is "just never use that bit".  *shrug*


Re: [Mesa-dev] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines

2019-03-26 Thread Kenneth Graunke
On Tuesday, March 26, 2019 12:16:20 AM PDT Chris Wilson wrote:
> Quoting Kenneth Graunke (2019-03-26 05:52:10)
> > On Monday, March 25, 2019 3:58:59 AM PDT Chris Wilson wrote:
> > > iris currently uses two distinct GEM contexts to have distinct logical
> > > HW contexts for the compute and render pipelines. However, using two
> > > distinct GEM contexts implies that they are distinct timelines, yet as
> > > they are a single GL context that implies they belong to a single
> > > timeline from the client perspective. Currently, fences are occasionally
> > > inserted to order the two timelines. Using 2 GEM contexts, also implies
> > > that we keep 2 ppGTT for identical buffer state. If we can create a
> > > single GEM context, with the right capabilities, we can have a single
> > > VM, a single timeline, but 2 logical HW contexts for the 2 pipelines.
> > > 
> > > This is allowed through the new context interface that allows VM to be
> > > shared, timelines to be specified, and for the logical contexts to be
> > > constructed as the user desires.
> > > 
> > > Cc: Joonas Lahtinen 
> > > Cc: Kenneth Graunke 
> > > ---
> > >  src/gallium/drivers/iris/iris_batch.c   | 16 ++-
> > >  src/gallium/drivers/iris/iris_batch.h   |  5 +--
> > >  src/gallium/drivers/iris/iris_context.c | 56 -
> > >  3 files changed, 60 insertions(+), 17 deletions(-)
> > 
> > Hi Chris,
> > 
> > I don't think that I want the single timeline option.  It seems like
> > we've been moving away from implicit sync for a long time, and the
> > explicit sync code we have is pretty straightforward and seems to do
> > the trick.  Jason and I also chatted briefly, and we don't necessarily
> > want to a strict submission-order between render/compute.
> 
> I disagree if you think this means more implicit sync. It is setting up
> the GEM context to an exact match of the GL context, by _explicit_
> control of the timeline. Then the fences you do export from inside the
> GL context do not need to be faked to be a composite of the pair of
> contexts. You still have explicit fences, and you have explicit control
> over the definition of their timeline.

With regard to multiple GL contexts, yes, everything remains explicit.
But having 2-3 separate timelines within a GL context allows us to
reorder work behind GL's back, which is all the rage these days for
performance.  Tilers do it all the time.  Position-only bucketing may
require it.  I'd really like to start treating render and compute as
distinct asynchronous queues.  At the very least, experimenting with
that and not tying my hands to a particular behavior.

There may be some use for single timeline, though.  Attaching images as
compute shader inputs may require CCS/HiZ resolves, which have to happen
on the RCS.  Right now, I do those on IRIS_BATCH_RENDER, which mean that
it backs up behind any queued render work.  Ideally, I'd do those on a
third context, which could be tied to the compute timeline, so the
resolves and the compute job can both execute ahead of queued rendering,
but still back to back.

> > Separating the VMA from the context state image seems like absolutely
> > the right thing to do - as you said, they're separate in hardware,
> > and no real reason to tie it together.  I would be in favor of new
> > uABI for that.
> > 
> > I don't think there will be much overhead reduction from sharing the
> > VMA here though.  It's very plausible that the compositor might want
> > to run between render and compute batches, at which point we end up
> > doing page directory loads anyway.  I have also heard rumors about bit
> > 47 becoming magical at some point which may prohibit us from sharing...
> 
> Yeah, but that doesn't actually affect the context setup, just how you
> decide to use it in end. And by that point, you'll be forced into using
> this new uABI anyway or something entirely different :-p

Looking into this a bit more, I think we're actually OK.  I thought I
might need to have distinct addresses for render and compute - at which
point nearly every address would differ in terms of bit 47 - but it
looks like the correct answer is "just never use that bit".  *shrug*

> > Context cloning seems OK, but I'm always pretty hesitant to add new
> > uABI unless it's strictly necessary.  In this case, we can do the same
> > thing with a little bit of userspace code, so I'm not sure it's worth
> > adding that...
> 
> Actually you cannot do the same without some of the new uABI either,
> since previously you did not have all the parameters exposed.

What isn't exposed?  We set up everything the first time, why can't we
do it again?

> > I would love to see an iris patch to use the new
> > I915_CONTEXT_PARAM_RECOVERABLE option without the other dependencies.
> 
> https://gitlab.freedesktop.org/ickle/mesa/commit/84d9cb1d8d98a50dcceea19ccbc3836b15cf73ae
> -Chris


signature.asc
Description: This is a digitally signed message part.

Re: [Mesa-dev] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines

2019-03-26 Thread Chris Wilson
Quoting Kenneth Graunke (2019-03-26 05:52:10)
> On Monday, March 25, 2019 3:58:59 AM PDT Chris Wilson wrote:
> > iris currently uses two distinct GEM contexts to have distinct logical
> > HW contexts for the compute and render pipelines. However, using two
> > distinct GEM contexts implies that they are distinct timelines, yet as
> > they are a single GL context that implies they belong to a single
> > timeline from the client perspective. Currently, fences are occasionally
> > inserted to order the two timelines. Using 2 GEM contexts, also implies
> > that we keep 2 ppGTT for identical buffer state. If we can create a
> > single GEM context, with the right capabilities, we can have a single
> > VM, a single timeline, but 2 logical HW contexts for the 2 pipelines.
> > 
> > This is allowed through the new context interface that allows VM to be
> > shared, timelines to be specified, and for the logical contexts to be
> > constructed as the user desires.
> > 
> > Cc: Joonas Lahtinen 
> > Cc: Kenneth Graunke 
> > ---
> >  src/gallium/drivers/iris/iris_batch.c   | 16 ++-
> >  src/gallium/drivers/iris/iris_batch.h   |  5 +--
> >  src/gallium/drivers/iris/iris_context.c | 56 -
> >  3 files changed, 60 insertions(+), 17 deletions(-)
> 
> Hi Chris,
> 
> I don't think that I want the single timeline option.  It seems like
> we've been moving away from implicit sync for a long time, and the
> explicit sync code we have is pretty straightforward and seems to do
> the trick.  Jason and I also chatted briefly, and we don't necessarily
> want to a strict submission-order between render/compute.

I disagree if you think this means more implicit sync. It is setting up
the GEM context to an exact match of the GL context, by _explicit_
control of the timeline. Then the fences you do export from inside the
GL context do not need to be faked to be a composite of the pair of
contexts. You still have explicit fences, and you have explicit control
over the definition of their timeline.

> Separating the VMA from the context state image seems like absolutely
> the right thing to do - as you said, they're separate in hardware,
> and no real reason to tie it together.  I would be in favor of new
> uABI for that.
> 
> I don't think there will be much overhead reduction from sharing the
> VMA here though.  It's very plausible that the compositor might want
> to run between render and compute batches, at which point we end up
> doing page directory loads anyway.  I have also heard rumors about bit
> 47 becoming magical at some point which may prohibit us from sharing...

Yeah, but that doesn't actually affect the context setup, just how you
decide to use it in end. And by that point, you'll be forced into using
this new uABI anyway or something entirely different :-p
 
> Context cloning seems OK, but I'm always pretty hesitant to add new
> uABI unless it's strictly necessary.  In this case, we can do the same
> thing with a little bit of userspace code, so I'm not sure it's worth
> adding that...

Actually you cannot do the same without some of the new uABI either,
since previously you did not have all the parameters exposed.
 
> I would love to see an iris patch to use the new
> I915_CONTEXT_PARAM_RECOVERABLE option without the other dependencies.

https://gitlab.freedesktop.org/ickle/mesa/commit/84d9cb1d8d98a50dcceea19ccbc3836b15cf73ae
-Chris
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines

2019-03-25 Thread Kenneth Graunke
On Monday, March 25, 2019 3:58:59 AM PDT Chris Wilson wrote:
> iris currently uses two distinct GEM contexts to have distinct logical
> HW contexts for the compute and render pipelines. However, using two
> distinct GEM contexts implies that they are distinct timelines, yet as
> they are a single GL context that implies they belong to a single
> timeline from the client perspective. Currently, fences are occasionally
> inserted to order the two timelines. Using 2 GEM contexts, also implies
> that we keep 2 ppGTT for identical buffer state. If we can create a
> single GEM context, with the right capabilities, we can have a single
> VM, a single timeline, but 2 logical HW contexts for the 2 pipelines.
> 
> This is allowed through the new context interface that allows VM to be
> shared, timelines to be specified, and for the logical contexts to be
> constructed as the user desires.
> 
> Cc: Joonas Lahtinen 
> Cc: Kenneth Graunke 
> ---
>  src/gallium/drivers/iris/iris_batch.c   | 16 ++-
>  src/gallium/drivers/iris/iris_batch.h   |  5 +--
>  src/gallium/drivers/iris/iris_context.c | 56 -
>  3 files changed, 60 insertions(+), 17 deletions(-)

Hi Chris,

I don't think that I want the single timeline option.  It seems like
we've been moving away from implicit sync for a long time, and the
explicit sync code we have is pretty straightforward and seems to do
the trick.  Jason and I also chatted briefly, and we don't necessarily
want to a strict submission-order between render/compute.

Separating the VMA from the context state image seems like absolutely
the right thing to do - as you said, they're separate in hardware,
and no real reason to tie it together.  I would be in favor of new
uABI for that.

I don't think there will be much overhead reduction from sharing the
VMA here though.  It's very plausible that the compositor might want
to run between render and compute batches, at which point we end up
doing page directory loads anyway.  I have also heard rumors about bit
47 becoming magical at some point which may prohibit us from sharing...

Context cloning seems OK, but I'm always pretty hesitant to add new
uABI unless it's strictly necessary.  In this case, we can do the same
thing with a little bit of userspace code, so I'm not sure it's worth
adding that...

I would love to see an iris patch to use the new
I915_CONTEXT_PARAM_RECOVERABLE option without the other dependencies.

--Ken


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

[Mesa-dev] [PATCH 2/3] iris: Create a composite context for both compute and render pipelines

2019-03-25 Thread Chris Wilson
iris currently uses two distinct GEM contexts to have distinct logical
HW contexts for the compute and render pipelines. However, using two
distinct GEM contexts implies that they are distinct timelines, yet as
they are a single GL context that implies they belong to a single
timeline from the client perspective. Currently, fences are occasionally
inserted to order the two timelines. Using 2 GEM contexts, also implies
that we keep 2 ppGTT for identical buffer state. If we can create a
single GEM context, with the right capabilities, we can have a single
VM, a single timeline, but 2 logical HW contexts for the 2 pipelines.

This is allowed through the new context interface that allows VM to be
shared, timelines to be specified, and for the logical contexts to be
constructed as the user desires.

Cc: Joonas Lahtinen 
Cc: Kenneth Graunke 
---
 src/gallium/drivers/iris/iris_batch.c   | 16 ++-
 src/gallium/drivers/iris/iris_batch.h   |  5 +--
 src/gallium/drivers/iris/iris_context.c | 56 -
 3 files changed, 60 insertions(+), 17 deletions(-)

diff --git a/src/gallium/drivers/iris/iris_batch.c 
b/src/gallium/drivers/iris/iris_batch.c
index 556422c38bc..40bcd939795 100644
--- a/src/gallium/drivers/iris/iris_batch.c
+++ b/src/gallium/drivers/iris/iris_batch.c
@@ -164,24 +164,16 @@ iris_init_batch(struct iris_batch *batch,
 struct iris_vtable *vtbl,
 struct pipe_debug_callback *dbg,
 struct iris_batch *all_batches,
-enum iris_batch_name name,
-uint8_t engine,
-int priority)
+uint32_t hw_id,
+enum iris_batch_name name)
 {
batch->screen = screen;
batch->vtbl = vtbl;
batch->dbg = dbg;
batch->name = name;
 
-   /* engine should be one of I915_EXEC_RENDER, I915_EXEC_BLT, etc. */
-   assert((engine & ~I915_EXEC_RING_MASK) == 0);
-   assert(util_bitcount(engine) == 1);
-   batch->engine = engine;
-
-   batch->hw_ctx_id = iris_create_hw_context(screen->bufmgr);
-   assert(batch->hw_ctx_id);
-
-   iris_hw_context_set_priority(screen->bufmgr, batch->hw_ctx_id, priority);
+   batch->hw_ctx_id = hw_id;
+   batch->engine = name;
 
util_dynarray_init(>exec_fences, ralloc_context(NULL));
util_dynarray_init(>syncpts, ralloc_context(NULL));
diff --git a/src/gallium/drivers/iris/iris_batch.h 
b/src/gallium/drivers/iris/iris_batch.h
index 2a68103c379..db1a4cbbe11 100644
--- a/src/gallium/drivers/iris/iris_batch.h
+++ b/src/gallium/drivers/iris/iris_batch.h
@@ -131,9 +131,8 @@ void iris_init_batch(struct iris_batch *batch,
  struct iris_vtable *vtbl,
  struct pipe_debug_callback *dbg,
  struct iris_batch *all_batches,
- enum iris_batch_name name,
- uint8_t ring,
- int priority);
+ uint32_t hw_id,
+ enum iris_batch_name name);
 void iris_chain_to_new_batch(struct iris_batch *batch);
 void iris_batch_free(struct iris_batch *batch);
 void iris_batch_maybe_flush(struct iris_batch *batch, unsigned estimate);
diff --git a/src/gallium/drivers/iris/iris_context.c 
b/src/gallium/drivers/iris/iris_context.c
index a1d11755a24..aeb608c70bd 100644
--- a/src/gallium/drivers/iris/iris_context.c
+++ b/src/gallium/drivers/iris/iris_context.c
@@ -143,6 +143,57 @@ iris_destroy_context(struct pipe_context *ctx)
   unreachable("Unknown hardware generation"); \
}
 
+static void create_hw_contexts(struct iris_screen *screen,
+   uint32_t *hw_id,
+   int priority)
+{
+#define RCS0 {0, 0}
+   I915_DEFINE_CONTEXT_PARAM_ENGINES(engines, 2) = {
+  .class_instance = { RCS0, RCS0 }
+   };
+   struct drm_i915_gem_context_create_ext_setparam p_engines = {
+  .base = {
+ .name = I915_CONTEXT_CREATE_EXT_SETPARAM,
+.next_extension = 0, /* end of chain */
+  },
+  .param = {
+ .param = I915_CONTEXT_PARAM_ENGINES,
+ .value = (uintptr_t),
+ .size = sizeof(engines),
+  },
+   };
+   struct drm_i915_gem_context_create_ext_setparam p_prio = {
+  .base = {
+ .name =I915_CONTEXT_CREATE_EXT_SETPARAM,
+ .next_extension = (uintptr_t)_engines,
+  },
+  .param = {
+ .param = I915_CONTEXT_PARAM_PRIORITY,
+ .value = priority,
+  },
+   };
+   struct drm_i915_gem_context_create_ext arg = {
+  .flags = I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE |
+  I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS,
+  .extensions = (uintptr_t)_prio,
+   };
+
+   if (drm_ioctl(screen->fd,
+ DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT,
+ ) == 0) {
+  for (int i = 0; i < IRIS_BATCH_COUNT; i++)
+ hw_id[i] = arg.ctx_id;
+  return;
+   }
+
+   /* No fancy new features; create two contexts instead */
+   for (int i = 0; i < IRIS_BATCH_COUNT; i++) {
+