[Mesa-dev] Freenode fallout

2021-05-20 Thread Lyude Paul
Hi everyone! As I'm sure most of you heard by now, Freenode's staff has
had a falling out and it's been recommended by their staff that projects
consider the network a hostile entity. I won't go into the details here,
but those who are interested can read up here:

https://lwn.net/Articles/856543/

At the moment, the vast majority of IRC channels for various Freedesktop
and X.org projects currently reside on Freenode. While the X.org
foundation doesn't have any official policies on IRC hosting, because of
how frequently IRC is used by various projects in our community we on
the board decided to make a non-binding recommendation on an IRC network
we think would be good to move to. We're also looking at ways to provide
some resources to help channels move en masse. We hope this will enable
interested projects to migrate to the same new IRC network in order to
ensure they're all in the same place.

After considering Libera and OFTC as options, the board settled on
recommending OFTC. The primary reason for this is because OFTC is
associated with our parent foundation SPI, and has a long and well known
history of involvement with the open source community. As well, the
board believes OFTC's current Governance model is a lot more clear then
Libera's.

-- 
Sincerely,
   Lyude Paul (she/her)
   Software Engineer at Red Hat
   
Note: I deal with a lot of emails and have a lot of bugs on my plate. If you've
asked me a question, are waiting for a review/merge on a patch, etc. and I
haven't responded in a while, please feel free to send me another email to check
on my status. I don't bite!

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Jason Ekstrand
On Thu, May 20, 2021 at 10:46 AM Matthew Brost  wrote:
>
> On Thu, May 20, 2021 at 01:11:59PM +0200, Christian König wrote:
> > Am 19.05.21 um 18:51 schrieb Matthew Brost:
> > > On Wed, May 19, 2021 at 01:45:39PM +0200, Christian König wrote:
> > > > Oh, yeah we call that gang submit on the AMD side.
> > > >
> > > > Had already some internal discussions how to implement this, but so far
> > > > couldn't figure out how to cleanly introduce that into the DRM 
> > > > scheduler.
> > > >
> > > > Can you briefly describe in a few words how that is supposed to work on 
> > > > the
> > > > Intel side?

On Intel, we actually have two cases which don't fit the current
drm/scheduler model well: balanced and bonded.

In the balanced model, we want to submit a batch which can go to any
one of some set of engines and we don't care which.  It's up to the
kernel to pick an engine.  Imagine you had 64 identical HW compute
queues, for instance.  This could be done by making all the identical
engines share a single drm_gpu_scheduler and round-robin around the HW
queues or something.  I don't know that we strictly need drm/scheduler
to be aware of it but it might be nice if it grew support for this
mode so we could maintain a 1:1 relationship between HW queues and
drm_gpu_schedulers.  That said, I'm not sure how this would play with
GuC queues so maybe it doesn't help?

The bonded model is like your ganged, I think.  We want to submit N
batches to run in parallel.  And they actually have to be executing on
the GPU simultaneously and not just sort-of at similar times.  We need
this for video.  There are also potential use-cases in Vulkan or even
GL that might be able to use this.  One difference with the balanced
mode is that bonds don't, strictly speaking, need to be on the same
type of engine.  Imagine, for instance, a 3D batch with a parallel
compute batch doing vertex pre-processing.

I'm pretty sure the bonded case is something that the mobile drivers
(panfrost, etc.) would like as well for doing Vulkan on tilers where
you often have to have two command buffers running in parallel.
They're currently doing it by submitting a giant pile of batches where
they split the batch and add sync primitives every time some GL call
requires them to sync between fragment and vertex pipes.

So, to sum up, I think there's likely some good collaboration to be
had here for everyone. :-)

--Jason

> > > Sure, I've done a quick PoC internally and have been able to hook this
> > > into the DRM scheduler.
> > >
> > > Basically each BB still maps to a single job as each job is somewhat
> > > unique (e.g. each job has its own ring, lrc, seqno, etc...). However all
> > > the jobs configured to run in parallel map to a single sched_entity
> > > which maintains the order each job was generated from the execbuf IOCTL
> > > (1 - N). When the backend receives jobs 1 to N - 1 it basically just
> > > updates some internal state. When the backend sees job N (last job) it
> > > actually does the submit for jobs 1 - N which with GuC submission is a
> > > simple command moving the LRC tail of the N jobs.
> > >
> > > Daniel has suggested that we create a single job for the NN BBs but that
> > > would be huge rework to the internals of the i915 and likely won't
> > > happen by the time this code first lands.
> > >
> > > Also worth noting one way a job isn't really a treated individually is
> > > the excl slot with dma-resv. In that case we create a composite fence of
> > > all jobs (dma_fence_array).
> >
> > Yeah, that's something we have discussed as well.
> >
> > How do you prevent the scheduler from over committing to a single ring
> > buffer in this scenario?
> >
>
> Each job has its own ring, the execbuf IOCTL throttles itself for each
> job if there isn't space in the ring. This is exactly the same as
> non-parallel submits.
>
> I think this is what you were asking? If not, maybe try explaining the
> question a bit more.
>
> Matt
>
> > Christian.
> >
> > >
> > > Matt
> > >
> > > > Thanks,
> > > > Christian.
> > > >
> > > > Am 19.05.21 um 01:58 schrieb Matthew Brost:
> > > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > > >
> > > > > v2:
> > > > >(Daniel Vetter):
> > > > > - Expand logical order explaination
> > > > > - Add dummy header
> > > > > - Only allow N BBs in execbuf IOCTL
> > > > > - Configure parallel submission per slot not per gem context
> > > > >
> > > > > Cc: Tvrtko Ursulin 
> > > > > Cc: Tony Ye 
> > > > > CC: Carl Zhang 
> > > > > Cc: Daniel Vetter 
> > > > > Cc: Jason Ekstrand 
> > > > > Signed-off-by: Matthew Brost 
> > > > > ---
> > > > >Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 
> > > > > ++
> > > > >Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > > >2 files changed, 196 insertions(+), 1 deletion(-)
> > > > >create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > >
> > > > > diff --git 

Re: [Mesa-dev] [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Daniel Vetter
On Thu, May 20, 2021 at 08:10:59AM -0700, Matthew Brost wrote:
> On Thu, May 20, 2021 at 11:54:25AM +0200, Daniel Vetter wrote:
> > On Wed, May 19, 2021 at 7:19 PM Matthew Brost  
> > wrote:
> > >
> > > On Wed, May 19, 2021 at 01:10:04PM +0200, Daniel Vetter wrote:
> > > > On Tue, May 18, 2021 at 04:58:30PM -0700, Matthew Brost wrote:
> > > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > > >
> > > > > v2:
> > > > >  (Daniel Vetter):
> > > > >   - Expand logical order explaination
> > > > >   - Add dummy header
> > > > >   - Only allow N BBs in execbuf IOCTL
> > > > >   - Configure parallel submission per slot not per gem context
> > > > >
> > > > > Cc: Tvrtko Ursulin 
> > > > > Cc: Tony Ye 
> > > > > CC: Carl Zhang 
> > > > > Cc: Daniel Vetter 
> > > > > Cc: Jason Ekstrand 
> > > > > Signed-off-by: Matthew Brost 
> > > > > ---
> > > > >  Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 
> > > > > ++
> > > > >  Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > > >  2 files changed, 196 insertions(+), 1 deletion(-)
> > > > >  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > >
> > > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > new file mode 100644
> > > > > index ..8c64b983ccad
> > > > > --- /dev/null
> > > > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > @@ -0,0 +1,144 @@
> > > > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > > > i915_context_engines_parallel_submit */
> > > > > +
> > > > > +/*
> > > > > + * i915_context_engines_parallel_submit:
> > > > > + *
> > > > > + * Setup a slot to allow multiple BBs to be submitted in a single 
> > > > > execbuf IOCTL.
> > > > > + * Those BBs will then be scheduled to run on the GPU in parallel. 
> > > > > Multiple
> > > > > + * hardware contexts are created internally in the i915 run these 
> > > > > BBs. Once a
> > > > > + * slot is configured for N BBs only N BBs can be submitted in each 
> > > > > execbuf
> > > > > + * IOCTL and this is implict behavior (e.g. the user doesn't tell 
> > > > > the execbuf
> > > > > + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there 
> > > > > are based on
> > > > > + * the slots configuration).
> > > > > + *
> > > > > + * Their are two currently defined ways to control the placement of 
> > > > > the
> > > > > + * hardware contexts on physical engines: default behavior (no 
> > > > > flags) and
> > > > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the 
> > > > > in the
> > > > > + * future as new hardware / use cases arise. Details of how to use 
> > > > > this
> > > > > + * interface below above the flags.
> > > > > + *
> > > > > + * Returns -EINVAL if hardware context placement configuration 
> > > > > invalid or if the
> > > > > + * placement configuration isn't supported on the platform / 
> > > > > submission
> > > > > + * interface.
> > > > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > > > submission
> > > > > + * inteface.
> > > > > + */
> > > > > +struct i915_context_engines_parallel_submit {
> > > > > +   struct i915_user_extension base;
> > > > > +
> > > > > +   __u16 engine_index; /* slot for parallel engine */
> > > > > +   __u16 width;/* number of contexts per parallel engine 
> > > > > */
> > > > > +   __u16 num_siblings; /* number of siblings per context */
> > > > > +   __u16 mbz16;
> > > >
> > > > Ok the big picture looks reasonable now, the flags still confuse me.
> > > >
> > >
> > > Yea, it is a bit confusing.
> > >
> > > > > +/*
> > > > > + * Default placement behvavior (currently unsupported):
> > > > > + *
> > > > > + * Rather than restricting parallel submission to a single class 
> > > > > with a
> > > > > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add 
> > > > > a mode that
> > > > > + * enables parallel submission across multiple engine classes. In 
> > > > > this case each
> > > > > + * context's logical engine mask indicates where that context can 
> > > > > placed. It is
> > > > > + * implied in this mode that all contexts have mutual exclusive 
> > > > > placement (e.g.
> > > > > + * if one context is running CS0 no other contexts can run on CS0).
> > > > > + *
> > > > > + * Example 1 pseudo code:
> > > > > + * CSX[Y] = engine class X, logical instance Y
> > > > > + * INVALID = I915_ENGINE_CLASS_INVALID, 
> > > > > I915_ENGINE_CLASS_INVALID_NONE
> > > > > + * set_engines(INVALID)
> > > > > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > > > > + * engines=CS0[0],CS0[1],CS1[0],CS1[1])
> > > > > + *
> > > > > + * Results in the following valid placements:
> > > > > + * CS0[0], CS1[0]
> > > > > + * CS0[0], CS1[1]
> > > > > + * CS0[1], CS1[0]
> > > > > + * CS0[1], CS1[1]
> > > > > + *
> > > > > + * This can also be though of as 2 virtual engines:
> > > > > + * VE[0] 

Re: [Mesa-dev] [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Daniel Vetter
On Thu, May 20, 2021 at 11:57:44AM +0100, Tvrtko Ursulin wrote:
> 
> On 20/05/2021 10:54, Daniel Vetter wrote:
> > On Wed, May 19, 2021 at 7:19 PM Matthew Brost  
> > wrote:
> > > 
> > > On Wed, May 19, 2021 at 01:10:04PM +0200, Daniel Vetter wrote:
> > > > On Tue, May 18, 2021 at 04:58:30PM -0700, Matthew Brost wrote:
> > > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > > > 
> > > > > v2:
> > > > >   (Daniel Vetter):
> > > > >- Expand logical order explaination
> > > > >- Add dummy header
> > > > >- Only allow N BBs in execbuf IOCTL
> > > > >- Configure parallel submission per slot not per gem context
> > > > > 
> > > > > Cc: Tvrtko Ursulin 
> > > > > Cc: Tony Ye 
> > > > > CC: Carl Zhang 
> > > > > Cc: Daniel Vetter 
> > > > > Cc: Jason Ekstrand 
> > > > > Signed-off-by: Matthew Brost 
> > > > > ---
> > > > >   Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 
> > > > > ++
> > > > >   Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > > >   2 files changed, 196 insertions(+), 1 deletion(-)
> > > > >   create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > 
> > > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > new file mode 100644
> > > > > index ..8c64b983ccad
> > > > > --- /dev/null
> > > > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > > @@ -0,0 +1,144 @@
> > > > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > > > i915_context_engines_parallel_submit */
> > > > > +
> > > > > +/*
> > > > > + * i915_context_engines_parallel_submit:
> > > > > + *
> > > > > + * Setup a slot to allow multiple BBs to be submitted in a single 
> > > > > execbuf IOCTL.
> > > > > + * Those BBs will then be scheduled to run on the GPU in parallel. 
> > > > > Multiple
> > > > > + * hardware contexts are created internally in the i915 run these 
> > > > > BBs. Once a
> > > > > + * slot is configured for N BBs only N BBs can be submitted in each 
> > > > > execbuf
> > > > > + * IOCTL and this is implict behavior (e.g. the user doesn't tell 
> > > > > the execbuf
> > > > > + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there 
> > > > > are based on
> > > > > + * the slots configuration).
> > > > > + *
> > > > > + * Their are two currently defined ways to control the placement of 
> > > > > the
> > > > > + * hardware contexts on physical engines: default behavior (no 
> > > > > flags) and
> > > > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the 
> > > > > in the
> > > > > + * future as new hardware / use cases arise. Details of how to use 
> > > > > this
> > > > > + * interface below above the flags.
> > > > > + *
> > > > > + * Returns -EINVAL if hardware context placement configuration 
> > > > > invalid or if the
> > > > > + * placement configuration isn't supported on the platform / 
> > > > > submission
> > > > > + * interface.
> > > > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > > > submission
> > > > > + * inteface.
> > > > > + */
> > > > > +struct i915_context_engines_parallel_submit {
> > > > > +   struct i915_user_extension base;
> > > > > +
> > > > > +   __u16 engine_index; /* slot for parallel engine */
> > > > > +   __u16 width;/* number of contexts per parallel engine 
> > > > > */
> > > > > +   __u16 num_siblings; /* number of siblings per context */
> > > > > +   __u16 mbz16;
> > > > 
> > > > Ok the big picture looks reasonable now, the flags still confuse me.
> > > > 
> > > 
> > > Yea, it is a bit confusing.
> > > 
> > > > > +/*
> > > > > + * Default placement behvavior (currently unsupported):
> > > > > + *
> > > > > + * Rather than restricting parallel submission to a single class 
> > > > > with a
> > > > > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add 
> > > > > a mode that
> > > > > + * enables parallel submission across multiple engine classes. In 
> > > > > this case each
> > > > > + * context's logical engine mask indicates where that context can 
> > > > > placed. It is
> > > > > + * implied in this mode that all contexts have mutual exclusive 
> > > > > placement (e.g.
> > > > > + * if one context is running CS0 no other contexts can run on CS0).
> > > > > + *
> > > > > + * Example 1 pseudo code:
> > > > > + * CSX[Y] = engine class X, logical instance Y
> > > > > + * INVALID = I915_ENGINE_CLASS_INVALID, 
> > > > > I915_ENGINE_CLASS_INVALID_NONE
> > > > > + * set_engines(INVALID)
> > > > > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > > > > + * engines=CS0[0],CS0[1],CS1[0],CS1[1])
> > > > > + *
> > > > > + * Results in the following valid placements:
> > > > > + * CS0[0], CS1[0]
> > > > > + * CS0[0], CS1[1]
> > > > > + * CS0[1], CS1[0]
> > > > > + * CS0[1], CS1[1]
> > > > > + *
> > > > > + * This can also be though of as 2 virtual engines:
> > > > > + * VE[0] 

Re: [Mesa-dev] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Matthew Brost
On Thu, May 20, 2021 at 01:11:59PM +0200, Christian König wrote:
> Am 19.05.21 um 18:51 schrieb Matthew Brost:
> > On Wed, May 19, 2021 at 01:45:39PM +0200, Christian König wrote:
> > > Oh, yeah we call that gang submit on the AMD side.
> > > 
> > > Had already some internal discussions how to implement this, but so far
> > > couldn't figure out how to cleanly introduce that into the DRM scheduler.
> > > 
> > > Can you briefly describe in a few words how that is supposed to work on 
> > > the
> > > Intel side?
> > > 
> > Sure, I've done a quick PoC internally and have been able to hook this
> > into the DRM scheduler.
> > 
> > Basically each BB still maps to a single job as each job is somewhat
> > unique (e.g. each job has its own ring, lrc, seqno, etc...). However all
> > the jobs configured to run in parallel map to a single sched_entity
> > which maintains the order each job was generated from the execbuf IOCTL
> > (1 - N). When the backend receives jobs 1 to N - 1 it basically just
> > updates some internal state. When the backend sees job N (last job) it
> > actually does the submit for jobs 1 - N which with GuC submission is a
> > simple command moving the LRC tail of the N jobs.
> > 
> > Daniel has suggested that we create a single job for the NN BBs but that
> > would be huge rework to the internals of the i915 and likely won't
> > happen by the time this code first lands.
> > 
> > Also worth noting one way a job isn't really a treated individually is
> > the excl slot with dma-resv. In that case we create a composite fence of
> > all jobs (dma_fence_array).
> 
> Yeah, that's something we have discussed as well.
> 
> How do you prevent the scheduler from over committing to a single ring
> buffer in this scenario?
> 

Each job has its own ring, the execbuf IOCTL throttles itself for each
job if there isn't space in the ring. This is exactly the same as
non-parallel submits.

I think this is what you were asking? If not, maybe try explaining the
question a bit more.

Matt

> Christian.
> 
> > 
> > Matt
> > 
> > > Thanks,
> > > Christian.
> > > 
> > > Am 19.05.21 um 01:58 schrieb Matthew Brost:
> > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > > 
> > > > v2:
> > > >(Daniel Vetter):
> > > > - Expand logical order explaination
> > > > - Add dummy header
> > > > - Only allow N BBs in execbuf IOCTL
> > > > - Configure parallel submission per slot not per gem context
> > > > 
> > > > Cc: Tvrtko Ursulin 
> > > > Cc: Tony Ye 
> > > > CC: Carl Zhang 
> > > > Cc: Daniel Vetter 
> > > > Cc: Jason Ekstrand 
> > > > Signed-off-by: Matthew Brost 
> > > > ---
> > > >Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 
> > > > ++
> > > >Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > >2 files changed, 196 insertions(+), 1 deletion(-)
> > > >create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > 
> > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > new file mode 100644
> > > > index ..8c64b983ccad
> > > > --- /dev/null
> > > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > @@ -0,0 +1,144 @@
> > > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > > i915_context_engines_parallel_submit */
> > > > +
> > > > +/*
> > > > + * i915_context_engines_parallel_submit:
> > > > + *
> > > > + * Setup a slot to allow multiple BBs to be submitted in a single 
> > > > execbuf IOCTL.
> > > > + * Those BBs will then be scheduled to run on the GPU in parallel. 
> > > > Multiple
> > > > + * hardware contexts are created internally in the i915 run these BBs. 
> > > > Once a
> > > > + * slot is configured for N BBs only N BBs can be submitted in each 
> > > > execbuf
> > > > + * IOCTL and this is implict behavior (e.g. the user doesn't tell the 
> > > > execbuf
> > > > + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there 
> > > > are based on
> > > > + * the slots configuration).
> > > > + *
> > > > + * Their are two currently defined ways to control the placement of the
> > > > + * hardware contexts on physical engines: default behavior (no flags) 
> > > > and
> > > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the 
> > > > in the
> > > > + * future as new hardware / use cases arise. Details of how to use this
> > > > + * interface below above the flags.
> > > > + *
> > > > + * Returns -EINVAL if hardware context placement configuration invalid 
> > > > or if the
> > > > + * placement configuration isn't supported on the platform / submission
> > > > + * interface.
> > > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > > submission
> > > > + * inteface.
> > > > + */
> > > > +struct i915_context_engines_parallel_submit {
> > > > +   struct i915_user_extension base;
> > > > +
> > > > +   __u16 engine_index; /* slot for parallel engine */
> 

Re: [Mesa-dev] [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Matthew Brost
On Thu, May 20, 2021 at 11:54:25AM +0200, Daniel Vetter wrote:
> On Wed, May 19, 2021 at 7:19 PM Matthew Brost  wrote:
> >
> > On Wed, May 19, 2021 at 01:10:04PM +0200, Daniel Vetter wrote:
> > > On Tue, May 18, 2021 at 04:58:30PM -0700, Matthew Brost wrote:
> > > > Add entry fpr i915 new parallel submission uAPI plan.
> > > >
> > > > v2:
> > > >  (Daniel Vetter):
> > > >   - Expand logical order explaination
> > > >   - Add dummy header
> > > >   - Only allow N BBs in execbuf IOCTL
> > > >   - Configure parallel submission per slot not per gem context
> > > >
> > > > Cc: Tvrtko Ursulin 
> > > > Cc: Tony Ye 
> > > > CC: Carl Zhang 
> > > > Cc: Daniel Vetter 
> > > > Cc: Jason Ekstrand 
> > > > Signed-off-by: Matthew Brost 
> > > > ---
> > > >  Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 ++
> > > >  Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > > >  2 files changed, 196 insertions(+), 1 deletion(-)
> > > >  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > >
> > > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > new file mode 100644
> > > > index ..8c64b983ccad
> > > > --- /dev/null
> > > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > > @@ -0,0 +1,144 @@
> > > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > > i915_context_engines_parallel_submit */
> > > > +
> > > > +/*
> > > > + * i915_context_engines_parallel_submit:
> > > > + *
> > > > + * Setup a slot to allow multiple BBs to be submitted in a single 
> > > > execbuf IOCTL.
> > > > + * Those BBs will then be scheduled to run on the GPU in parallel. 
> > > > Multiple
> > > > + * hardware contexts are created internally in the i915 run these BBs. 
> > > > Once a
> > > > + * slot is configured for N BBs only N BBs can be submitted in each 
> > > > execbuf
> > > > + * IOCTL and this is implict behavior (e.g. the user doesn't tell the 
> > > > execbuf
> > > > + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there 
> > > > are based on
> > > > + * the slots configuration).
> > > > + *
> > > > + * Their are two currently defined ways to control the placement of the
> > > > + * hardware contexts on physical engines: default behavior (no flags) 
> > > > and
> > > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the 
> > > > in the
> > > > + * future as new hardware / use cases arise. Details of how to use this
> > > > + * interface below above the flags.
> > > > + *
> > > > + * Returns -EINVAL if hardware context placement configuration invalid 
> > > > or if the
> > > > + * placement configuration isn't supported on the platform / submission
> > > > + * interface.
> > > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > > submission
> > > > + * inteface.
> > > > + */
> > > > +struct i915_context_engines_parallel_submit {
> > > > +   struct i915_user_extension base;
> > > > +
> > > > +   __u16 engine_index; /* slot for parallel engine */
> > > > +   __u16 width;/* number of contexts per parallel engine */
> > > > +   __u16 num_siblings; /* number of siblings per context */
> > > > +   __u16 mbz16;
> > >
> > > Ok the big picture looks reasonable now, the flags still confuse me.
> > >
> >
> > Yea, it is a bit confusing.
> >
> > > > +/*
> > > > + * Default placement behvavior (currently unsupported):
> > > > + *
> > > > + * Rather than restricting parallel submission to a single class with a
> > > > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a 
> > > > mode that
> > > > + * enables parallel submission across multiple engine classes. In this 
> > > > case each
> > > > + * context's logical engine mask indicates where that context can 
> > > > placed. It is
> > > > + * implied in this mode that all contexts have mutual exclusive 
> > > > placement (e.g.
> > > > + * if one context is running CS0 no other contexts can run on CS0).
> > > > + *
> > > > + * Example 1 pseudo code:
> > > > + * CSX[Y] = engine class X, logical instance Y
> > > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > > > + * set_engines(INVALID)
> > > > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > > > + * engines=CS0[0],CS0[1],CS1[0],CS1[1])
> > > > + *
> > > > + * Results in the following valid placements:
> > > > + * CS0[0], CS1[0]
> > > > + * CS0[0], CS1[1]
> > > > + * CS0[1], CS1[0]
> > > > + * CS0[1], CS1[1]
> > > > + *
> > > > + * This can also be though of as 2 virtual engines:
> > > > + * VE[0] = CS0[0], CS0[1]
> > > > + * VE[1] = CS1[0], CS1[1]
> > > > + *
> > > > + * Example 2 pseudo code:
> > > > + * CS[X] = generic engine of same class, logical instance X
> > > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > > > + * set_engines(INVALID)
> > > > + * set_parallel(engine_index=0, width=2, num_siblings=3,
> > > > + * 

Re: [Mesa-dev] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Christian König

Am 19.05.21 um 18:51 schrieb Matthew Brost:

On Wed, May 19, 2021 at 01:45:39PM +0200, Christian König wrote:

Oh, yeah we call that gang submit on the AMD side.

Had already some internal discussions how to implement this, but so far
couldn't figure out how to cleanly introduce that into the DRM scheduler.

Can you briefly describe in a few words how that is supposed to work on the
Intel side?


Sure, I've done a quick PoC internally and have been able to hook this
into the DRM scheduler.

Basically each BB still maps to a single job as each job is somewhat
unique (e.g. each job has its own ring, lrc, seqno, etc...). However all
the jobs configured to run in parallel map to a single sched_entity
which maintains the order each job was generated from the execbuf IOCTL
(1 - N). When the backend receives jobs 1 to N - 1 it basically just
updates some internal state. When the backend sees job N (last job) it
actually does the submit for jobs 1 - N which with GuC submission is a
simple command moving the LRC tail of the N jobs.

Daniel has suggested that we create a single job for the NN BBs but that
would be huge rework to the internals of the i915 and likely won't
happen by the time this code first lands.

Also worth noting one way a job isn't really a treated individually is
the excl slot with dma-resv. In that case we create a composite fence of
all jobs (dma_fence_array).


Yeah, that's something we have discussed as well.

How do you prevent the scheduler from over committing to a single ring 
buffer in this scenario?


Christian.



Matt


Thanks,
Christian.

Am 19.05.21 um 01:58 schrieb Matthew Brost:

Add entry fpr i915 new parallel submission uAPI plan.

v2:
   (Daniel Vetter):
- Expand logical order explaination
- Add dummy header
- Only allow N BBs in execbuf IOCTL
- Configure parallel submission per slot not per gem context

Cc: Tvrtko Ursulin 
Cc: Tony Ye 
CC: Carl Zhang 
Cc: Daniel Vetter 
Cc: Jason Ekstrand 
Signed-off-by: Matthew Brost 
---
   Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 ++
   Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
   2 files changed, 196 insertions(+), 1 deletion(-)
   create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h

diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
b/Documentation/gpu/rfc/i915_parallel_execbuf.h
new file mode 100644
index ..8c64b983ccad
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
@@ -0,0 +1,144 @@
+#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
i915_context_engines_parallel_submit */
+
+/*
+ * i915_context_engines_parallel_submit:
+ *
+ * Setup a slot to allow multiple BBs to be submitted in a single execbuf 
IOCTL.
+ * Those BBs will then be scheduled to run on the GPU in parallel. Multiple
+ * hardware contexts are created internally in the i915 run these BBs. Once a
+ * slot is configured for N BBs only N BBs can be submitted in each execbuf
+ * IOCTL and this is implict behavior (e.g. the user doesn't tell the execbuf
+ * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there are based 
on
+ * the slots configuration).
+ *
+ * Their are two currently defined ways to control the placement of the
+ * hardware contexts on physical engines: default behavior (no flags) and
+ * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the in the
+ * future as new hardware / use cases arise. Details of how to use this
+ * interface below above the flags.
+ *
+ * Returns -EINVAL if hardware context placement configuration invalid or if 
the
+ * placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * inteface.
+ */
+struct i915_context_engines_parallel_submit {
+   struct i915_user_extension base;
+
+   __u16 engine_index; /* slot for parallel engine */
+   __u16 width;/* number of contexts per parallel engine */
+   __u16 num_siblings; /* number of siblings per context */
+   __u16 mbz16;
+/*
+ * Default placement behvavior (currently unsupported):
+ *
+ * Rather than restricting parallel submission to a single class with a
+ * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a mode 
that
+ * enables parallel submission across multiple engine classes. In this case 
each
+ * context's logical engine mask indicates where that context can placed. It is
+ * implied in this mode that all contexts have mutual exclusive placement (e.g.
+ * if one context is running CS0 no other contexts can run on CS0).
+ *
+ * Example 1 pseudo code:
+ * CSX[Y] = engine class X, logical instance Y
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=2,
+ * engines=CS0[0],CS0[1],CS1[0],CS1[1])
+ *
+ * Results in the following valid placements:
+ * CS0[0], CS1[0]
+ * 

[Mesa-dev] XDC 2021: Registration & Call for Proposals now open!

2021-05-20 Thread Szwichtenberg, Radoslaw
Hello!

Registration & Call for Proposals are now open for XDC 2021, which will
take place on September 15-17, 2021. This year we will repeat as
virtual event.

https://indico.freedesktop.org/event/1/

As usual, the conference is free of charge and open to the general
public. If you plan on attending, please make sure to register as early
as possible!

In order to register as attendee, you will therefore need to register
via the XDC website. As XDC moved to a new Indico infrastructure, if
you previously registered on the XDC website, you need to create a new
account again.

https://indico.freedesktop.org/event/1/registrations/1/

In addition to registration, the CfP is now open for talks, workshops
and demos at XDC 2021. While any serious proposal will be gratefully
considered, topics of interest to X.Org and freedesktop.org developers
are encouraged. The program focus is on new development, ongoing
challenges and anything else that will spark discussions among
attendees in the hallway track.

We are open to talks across all layers of the graphics stack, from the
kernel to desktop environments / graphical applications and about how
to make things better for the developers who build them. Head to the
CfP page to learn more:

https://indico.freedesktop.org/event/1/abstracts/

The deadline for submissions is Sunday, 4 July 2021.

Last year we modified our Reimbursement Policy to accept speaker
expenses for X.Org virtual events like XDC 2021. Check it out here:

https://www.x.org/wiki/XorgFoundation/Policies/Reimbursement/

If you have any questions, please send me an email to
radoslaw.szwichtenb...@intel.com,  
adding on CC the X.org board (board
at foundation.x.org).

And don't forget, you can follow us on Twitter for all the latest
updates and to stay connected:


https://twitter.com/XOrgDevConf

Best,

Radek

P.S: a DNS redirection (xdc2021.x.org) is work in progress. Please use
the mentioned links for the moment.


Radosław Szwichtenberg
-
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173, 80-298 Gdansk
KRS 101882 - NIP 957-07-52-316

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Tvrtko Ursulin



On 20/05/2021 10:54, Daniel Vetter wrote:

On Wed, May 19, 2021 at 7:19 PM Matthew Brost  wrote:


On Wed, May 19, 2021 at 01:10:04PM +0200, Daniel Vetter wrote:

On Tue, May 18, 2021 at 04:58:30PM -0700, Matthew Brost wrote:

Add entry fpr i915 new parallel submission uAPI plan.

v2:
  (Daniel Vetter):
   - Expand logical order explaination
   - Add dummy header
   - Only allow N BBs in execbuf IOCTL
   - Configure parallel submission per slot not per gem context

Cc: Tvrtko Ursulin 
Cc: Tony Ye 
CC: Carl Zhang 
Cc: Daniel Vetter 
Cc: Jason Ekstrand 
Signed-off-by: Matthew Brost 
---
  Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 ++
  Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
  2 files changed, 196 insertions(+), 1 deletion(-)
  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h

diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
b/Documentation/gpu/rfc/i915_parallel_execbuf.h
new file mode 100644
index ..8c64b983ccad
--- /dev/null
+++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
@@ -0,0 +1,144 @@
+#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
i915_context_engines_parallel_submit */
+
+/*
+ * i915_context_engines_parallel_submit:
+ *
+ * Setup a slot to allow multiple BBs to be submitted in a single execbuf 
IOCTL.
+ * Those BBs will then be scheduled to run on the GPU in parallel. Multiple
+ * hardware contexts are created internally in the i915 run these BBs. Once a
+ * slot is configured for N BBs only N BBs can be submitted in each execbuf
+ * IOCTL and this is implict behavior (e.g. the user doesn't tell the execbuf
+ * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there are based 
on
+ * the slots configuration).
+ *
+ * Their are two currently defined ways to control the placement of the
+ * hardware contexts on physical engines: default behavior (no flags) and
+ * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the in the
+ * future as new hardware / use cases arise. Details of how to use this
+ * interface below above the flags.
+ *
+ * Returns -EINVAL if hardware context placement configuration invalid or if 
the
+ * placement configuration isn't supported on the platform / submission
+ * interface.
+ * Returns -ENODEV if extension isn't supported on the platform / submission
+ * inteface.
+ */
+struct i915_context_engines_parallel_submit {
+   struct i915_user_extension base;
+
+   __u16 engine_index; /* slot for parallel engine */
+   __u16 width;/* number of contexts per parallel engine */
+   __u16 num_siblings; /* number of siblings per context */
+   __u16 mbz16;


Ok the big picture looks reasonable now, the flags still confuse me.



Yea, it is a bit confusing.


+/*
+ * Default placement behvavior (currently unsupported):
+ *
+ * Rather than restricting parallel submission to a single class with a
+ * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a mode 
that
+ * enables parallel submission across multiple engine classes. In this case 
each
+ * context's logical engine mask indicates where that context can placed. It is
+ * implied in this mode that all contexts have mutual exclusive placement (e.g.
+ * if one context is running CS0 no other contexts can run on CS0).
+ *
+ * Example 1 pseudo code:
+ * CSX[Y] = engine class X, logical instance Y
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=2,
+ * engines=CS0[0],CS0[1],CS1[0],CS1[1])
+ *
+ * Results in the following valid placements:
+ * CS0[0], CS1[0]
+ * CS0[0], CS1[1]
+ * CS0[1], CS1[0]
+ * CS0[1], CS1[1]
+ *
+ * This can also be though of as 2 virtual engines:
+ * VE[0] = CS0[0], CS0[1]
+ * VE[1] = CS1[0], CS1[1]
+ *
+ * Example 2 pseudo code:
+ * CS[X] = generic engine of same class, logical instance X
+ * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
+ * set_engines(INVALID)
+ * set_parallel(engine_index=0, width=2, num_siblings=3,
+ * engines=CS[0],CS[1],CS[2],CS[0],CS[1],CS[2])
+ *
+ * Results in the following valid placements:
+ * CS[0], CS[1]
+ * CS[0], CS[2]
+ * CS[1], CS[0]
+ * CS[1], CS[2]
+ * CS[2], CS[0]
+ * CS[2], CS[1]
+ *
+ *
+ * This can also be though of as 2 virtual engines:
+ * VE[0] = CS[0], CS[1], CS[2]
+ * VE[1] = CS[0], CS[1], CS[2]
+
+ * This enables a use case where all engines are created equally, we don't care
+ * where they are scheduled, we just want a certain number of resources, for
+ * those resources to be scheduled in parallel, and possibly across multiple
+ * engine classes.
+ */


So I don't really get what this does compared to setting the flag below.
Is this just about running the batchbuffers the wrong way round, i.e. if
you have (simplest case)

width=2, num_sibglings=1, engines=CS[0], CS[1]

Then both
CS[0], CS[1]
and
CS[1], CS[0]
are possible options for running 2 batches? Iow, the backend is 

[Mesa-dev] Requests For Proposals for hosting XDC 2022 are now open

2021-05-20 Thread Samuel Iglesias Gonsálvez
Hello everyone!

The X.org board is soliciting proposals to host XDC in 2022. Since
XDC 2021 is being held in Europe this year (although virtually), we've
decided to host in North America. However, the board is open to other
locations, especially if there's an interesting co-location with
another conference.

Of course though, due to the ongoing COVID-19 pandemic it's not yet
clear whether or not it will be possible to host XDC 2022 in person,
although is seems very likely. Because of this, we would like to
make it clear that sponsors should prepare for both the possibility
of an in person conference, and the possibility of a virtual
conference. We will work with organizers on coming up with a
deadline for deciding whether or not we'll be going virtual, likely
sometime around July 2022.

If you're considering hosting XDC, we've assembled a wiki page with
what's generally expected and needed:

https://www.x.org/wiki/Events/RFP/

When submitting your proposal, please make sure to include at least the
key information about the potential location in question, possible
dates along with estimated costs. Proposals can be submitted to board
at foundation.x.org until the deadline of *September 1st, 2021*. 

Additionally, an quirk early heads-up to the board if you're
considering hosting would be appreciated, in case we need to adjust the
schedule a bit. Also, earlier is better since there generally will be a
bit of Q with organizers.

And if you just have some questions about what organizing XDC entails,
please feel free to chat with previous organizers, or someone from the
board.

Thanks,

Sam



signature.asc
Description: This is a digitally signed message part
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [Intel-gfx] [RFC 2/2] drm/doc/rfc: i915 new parallel submission uAPI plan

2021-05-20 Thread Daniel Vetter
On Wed, May 19, 2021 at 7:19 PM Matthew Brost  wrote:
>
> On Wed, May 19, 2021 at 01:10:04PM +0200, Daniel Vetter wrote:
> > On Tue, May 18, 2021 at 04:58:30PM -0700, Matthew Brost wrote:
> > > Add entry fpr i915 new parallel submission uAPI plan.
> > >
> > > v2:
> > >  (Daniel Vetter):
> > >   - Expand logical order explaination
> > >   - Add dummy header
> > >   - Only allow N BBs in execbuf IOCTL
> > >   - Configure parallel submission per slot not per gem context
> > >
> > > Cc: Tvrtko Ursulin 
> > > Cc: Tony Ye 
> > > CC: Carl Zhang 
> > > Cc: Daniel Vetter 
> > > Cc: Jason Ekstrand 
> > > Signed-off-by: Matthew Brost 
> > > ---
> > >  Documentation/gpu/rfc/i915_parallel_execbuf.h | 144 ++
> > >  Documentation/gpu/rfc/i915_scheduler.rst  |  53 ++-
> > >  2 files changed, 196 insertions(+), 1 deletion(-)
> > >  create mode 100644 Documentation/gpu/rfc/i915_parallel_execbuf.h
> > >
> > > diff --git a/Documentation/gpu/rfc/i915_parallel_execbuf.h 
> > > b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > new file mode 100644
> > > index ..8c64b983ccad
> > > --- /dev/null
> > > +++ b/Documentation/gpu/rfc/i915_parallel_execbuf.h
> > > @@ -0,0 +1,144 @@
> > > +#define I915_CONTEXT_ENGINES_EXT_PARALLEL_SUBMIT 2 /* see 
> > > i915_context_engines_parallel_submit */
> > > +
> > > +/*
> > > + * i915_context_engines_parallel_submit:
> > > + *
> > > + * Setup a slot to allow multiple BBs to be submitted in a single 
> > > execbuf IOCTL.
> > > + * Those BBs will then be scheduled to run on the GPU in parallel. 
> > > Multiple
> > > + * hardware contexts are created internally in the i915 run these BBs. 
> > > Once a
> > > + * slot is configured for N BBs only N BBs can be submitted in each 
> > > execbuf
> > > + * IOCTL and this is implict behavior (e.g. the user doesn't tell the 
> > > execbuf
> > > + * IOCTL there are N BBs, the execbuf IOCTL know how many BBs there are 
> > > based on
> > > + * the slots configuration).
> > > + *
> > > + * Their are two currently defined ways to control the placement of the
> > > + * hardware contexts on physical engines: default behavior (no flags) and
> > > + * I915_PARALLEL_IMPLICT_BONDS (a flag). More flags may be added the in 
> > > the
> > > + * future as new hardware / use cases arise. Details of how to use this
> > > + * interface below above the flags.
> > > + *
> > > + * Returns -EINVAL if hardware context placement configuration invalid 
> > > or if the
> > > + * placement configuration isn't supported on the platform / submission
> > > + * interface.
> > > + * Returns -ENODEV if extension isn't supported on the platform / 
> > > submission
> > > + * inteface.
> > > + */
> > > +struct i915_context_engines_parallel_submit {
> > > +   struct i915_user_extension base;
> > > +
> > > +   __u16 engine_index; /* slot for parallel engine */
> > > +   __u16 width;/* number of contexts per parallel engine */
> > > +   __u16 num_siblings; /* number of siblings per context */
> > > +   __u16 mbz16;
> >
> > Ok the big picture looks reasonable now, the flags still confuse me.
> >
>
> Yea, it is a bit confusing.
>
> > > +/*
> > > + * Default placement behvavior (currently unsupported):
> > > + *
> > > + * Rather than restricting parallel submission to a single class with a
> > > + * logically contiguous placement (I915_PARALLEL_IMPLICT_BONDS), add a 
> > > mode that
> > > + * enables parallel submission across multiple engine classes. In this 
> > > case each
> > > + * context's logical engine mask indicates where that context can 
> > > placed. It is
> > > + * implied in this mode that all contexts have mutual exclusive 
> > > placement (e.g.
> > > + * if one context is running CS0 no other contexts can run on CS0).
> > > + *
> > > + * Example 1 pseudo code:
> > > + * CSX[Y] = engine class X, logical instance Y
> > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > > + * set_engines(INVALID)
> > > + * set_parallel(engine_index=0, width=2, num_siblings=2,
> > > + * engines=CS0[0],CS0[1],CS1[0],CS1[1])
> > > + *
> > > + * Results in the following valid placements:
> > > + * CS0[0], CS1[0]
> > > + * CS0[0], CS1[1]
> > > + * CS0[1], CS1[0]
> > > + * CS0[1], CS1[1]
> > > + *
> > > + * This can also be though of as 2 virtual engines:
> > > + * VE[0] = CS0[0], CS0[1]
> > > + * VE[1] = CS1[0], CS1[1]
> > > + *
> > > + * Example 2 pseudo code:
> > > + * CS[X] = generic engine of same class, logical instance X
> > > + * INVALID = I915_ENGINE_CLASS_INVALID, I915_ENGINE_CLASS_INVALID_NONE
> > > + * set_engines(INVALID)
> > > + * set_parallel(engine_index=0, width=2, num_siblings=3,
> > > + * engines=CS[0],CS[1],CS[2],CS[0],CS[1],CS[2])
> > > + *
> > > + * Results in the following valid placements:
> > > + * CS[0], CS[1]
> > > + * CS[0], CS[2]
> > > + * CS[1], CS[0]
> > > + * CS[1], CS[2]
> > > + * CS[2], CS[0]
> > > + * CS[2], CS[1]
> > > + *
> > > + *
> > > + * This can also be though of as