On Mon, Sep 8, 2025 at 8:54 AM Christian König <christian.koe...@amd.com> wrote: > > On 05.09.25 20:39, Liu, Shaoyun wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > > > I can confirm that during world switch the entire gfx block (including gfx, > > compute and sdma for gfx10+) been switched together . > > Yeah, but that simply doesn't work as expected. > > The problem is that the world switch can't preempt running gfx shaders and > compute shaders only when CWSR is available. > > Now what world switch currently does is to wait for the gfx draw to finish, > then pause the gfx queue and then other the compute queues. > > When gfx starts first that approach works, but when the compute queue runs > first we then try to preempt a compute queue which is waiting for the gfx > draw to start. > > Since we don't have CWSR for this compute queue this results in a lockup at > the moment.
Compute queues can still preempt without CWSR, it's just dispatch level (like gfx) rather than instruction level preemption. Alex > > Regards, > Christian. > > > > > Regards > > Shaoyun.liu > > > > -----Original Message----- > > From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Alex > > Deucher > > Sent: Friday, September 5, 2025 9:32 AM > > To: Christian König <ckoenig.leichtzumer...@gmail.com> > > Cc: Deucher, Alexander <alexander.deuc...@amd.com>; > > amd-gfx@lists.freedesktop.org; timur.kris...@gmail.com > > Subject: Re: [PATCH 2/2] drm/amdgpu: reject gang submissions under SRIOV > > > > On Fri, Sep 5, 2025 at 8:47 AM Christian König > > <ckoenig.leichtzumer...@gmail.com> wrote: > >> > >> Gang submission means that the kernel driver guarantees that multiple > >> submissions are executed on the HW at the same time on different engines. > >> > >> Background is that those submissions then depend on each other and > >> each can't finish stand alone. > >> > >> SRIOV now uses world switch to preempt submissions on the engines to > >> allow sharing the HW resources between multiple VFs. > >> > >> The problem is now that the SRIOV world switch can't know about such > >> inter dependencies and will cause a timeout if it waits for a > >> partially running gang submission. > >> > >> To conclude SRIOV and gang submissions are fundamentally incompatible > >> at the moment. For now just disable them. > > > > Are you sure about this? Thinking about this more, most gang submissions > > are between gfx and compute. The entire GC block (gfx, compute, and sdma > > on gfx10+) gets preempted on world switch so all of the active queues would > > be preempted. Everything gets resumed when the VF gets switched back. > > VCN/JPEG gets switched independently so that could be a problem if you have > > a gang with VCN and GC, but I think all gangs within GC should in theory be > > ok. > > > > Alex > > > >> > >> Signed-off-by: Christian König <christian.koe...@amd.com> > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >> index 2ac9729e4c86..434a551365c7 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > >> @@ -286,7 +286,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p, > >> } > >> } > >> > >> - if (!p->gang_size) { > >> + if (!p->gang_size || (amdgpu_sriov_vf(p->adev) && p->gang_size > >> + > 1)) { > >> ret = -EINVAL; > >> goto free_all_kdata; > >> } > >> -- > >> 2.43.0 > >> >