On Mon, Sep 8, 2025 at 8:54 AM Christian König <christian.koe...@amd.com> wrote:
>
> On 05.09.25 20:39, Liu, Shaoyun wrote:
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> > I can confirm that during world switch the entire gfx block (including gfx, 
> > compute and sdma for gfx10+) been switched together .
>
> Yeah, but that simply doesn't work as expected.
>
> The problem is that the world switch can't preempt running gfx shaders and 
> compute shaders only when CWSR is available.
>
> Now what world switch currently does is to wait for the gfx draw to finish, 
> then pause the gfx queue and then other the compute queues.
>
> When gfx starts first that approach works, but when the compute queue runs 
> first we then try to preempt a compute queue which is waiting for the gfx 
> draw to start.
>
> Since we don't have CWSR for this compute queue this results in a lockup at 
> the moment.

Compute queues can still preempt without CWSR, it's just dispatch
level (like gfx) rather than instruction level preemption.

Alex

>
> Regards,
> Christian.
>
> >
> > Regards
> > Shaoyun.liu
> >
> > -----Original Message-----
> > From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Alex 
> > Deucher
> > Sent: Friday, September 5, 2025 9:32 AM
> > To: Christian König <ckoenig.leichtzumer...@gmail.com>
> > Cc: Deucher, Alexander <alexander.deuc...@amd.com>; 
> > amd-gfx@lists.freedesktop.org; timur.kris...@gmail.com
> > Subject: Re: [PATCH 2/2] drm/amdgpu: reject gang submissions under SRIOV
> >
> > On Fri, Sep 5, 2025 at 8:47 AM Christian König 
> > <ckoenig.leichtzumer...@gmail.com> wrote:
> >>
> >> Gang submission means that the kernel driver guarantees that multiple
> >> submissions are executed on the HW at the same time on different engines.
> >>
> >> Background is that those submissions then depend on each other and
> >> each can't finish stand alone.
> >>
> >> SRIOV now uses world switch to preempt submissions on the engines to
> >> allow sharing the HW resources between multiple VFs.
> >>
> >> The problem is now that the SRIOV world switch can't know about such
> >> inter dependencies and will cause a timeout if it waits for a
> >> partially running gang submission.
> >>
> >> To conclude SRIOV and gang submissions are fundamentally incompatible
> >> at the moment. For now just disable them.
> >
> > Are you sure about this?  Thinking about this more, most gang submissions 
> > are between gfx and compute.  The entire GC block (gfx, compute, and sdma 
> > on gfx10+) gets preempted on world switch so all of the active queues would 
> > be preempted.  Everything gets resumed when the VF gets switched back.  
> > VCN/JPEG gets switched independently so that could be a problem if you have 
> > a gang with VCN and GC, but I think all gangs within GC should in theory be 
> > ok.
> >
> > Alex
> >
> >>
> >> Signed-off-by: Christian König <christian.koe...@amd.com>
> >> ---
> >>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
> >>  1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> index 2ac9729e4c86..434a551365c7 100644
> >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> >> @@ -286,7 +286,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
> >>                 }
> >>         }
> >>
> >> -       if (!p->gang_size) {
> >> +       if (!p->gang_size || (amdgpu_sriov_vf(p->adev) && p->gang_size
> >> + > 1)) {
> >>                 ret = -EINVAL;
> >>                 goto free_all_kdata;
> >>         }
> >> --
> >> 2.43.0
> >>
>

Reply via email to