On Fri, 2025-09-05 at 18:39 +0000, Liu, Shaoyun wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
> I can confirm that during world switch the entire gfx block
> (including gfx, compute and sdma for gfx10+) been switched together .
> 
> Regards
> Shaoyun.liu

Hi Everyone,

At the moment there are only two uses of gang submit:

1. Mesh + task shaders where GFX and ACE are used together
2. Transfer queues where SDMA and ACE are used together (not yet fully
implemented in RADV)

Based on the conversation above, it sounds like both of these would
work just fine under SRIOV.

Thanks,
Timur


> 
> -----Original Message-----
> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of
> Alex Deucher
> Sent: Friday, September 5, 2025 9:32 AM
> To: Christian König <ckoenig.leichtzumer...@gmail.com>
> Cc: Deucher, Alexander <alexander.deuc...@amd.com>;
> amd-gfx@lists.freedesktop.org; timur.kris...@gmail.com
> Subject: Re: [PATCH 2/2] drm/amdgpu: reject gang submissions under
> SRIOV
> 
> On Fri, Sep 5, 2025 at 8:47 AM Christian König
> <ckoenig.leichtzumer...@gmail.com> wrote:
> > 
> > Gang submission means that the kernel driver guarantees that
> > multiple
> > submissions are executed on the HW at the same time on different
> > engines.
> > 
> > Background is that those submissions then depend on each other and
> > each can't finish stand alone.
> > 
> > SRIOV now uses world switch to preempt submissions on the engines
> > to
> > allow sharing the HW resources between multiple VFs.
> > 
> > The problem is now that the SRIOV world switch can't know about
> > such
> > inter dependencies and will cause a timeout if it waits for a
> > partially running gang submission.
> > 
> > To conclude SRIOV and gang submissions are fundamentally
> > incompatible
> > at the moment. For now just disable them.
> 
> Are you sure about this?  Thinking about this more, most gang
> submissions are between gfx and compute.  The entire GC block (gfx,
> compute, and sdma on gfx10+) gets preempted on world switch so all of
> the active queues would be preempted.  Everything gets resumed when
> the VF gets switched back.  VCN/JPEG gets switched independently so
> that could be a problem if you have a gang with VCN and GC, but I
> think all gangs within GC should in theory be ok.
> 
> Alex
> 
> > 
> > Signed-off-by: Christian König <christian.koe...@amd.com>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index 2ac9729e4c86..434a551365c7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -286,7 +286,7 @@ static int amdgpu_cs_pass1(struct
> > amdgpu_cs_parser *p,
> >                 }
> >         }
> > 
> > -       if (!p->gang_size) {
> > +       if (!p->gang_size || (amdgpu_sriov_vf(p->adev) && p-
> > >gang_size
> > + > 1)) {
> >                 ret = -EINVAL;
> >                 goto free_all_kdata;
> >         }
> > --
> > 2.43.0
> > 

Reply via email to