On 05.09.25 20:39, Liu, Shaoyun wrote:
> [AMD Official Use Only - AMD Internal Distribution Only]
> 
> I can confirm that during world switch the entire gfx block (including gfx, 
> compute and sdma for gfx10+) been switched together .

Yeah, but that simply doesn't work as expected.

The problem is that the world switch can't preempt running gfx shaders and 
compute shaders only when CWSR is available.

Now what world switch currently does is to wait for the gfx draw to finish, 
then pause the gfx queue and then other the compute queues.

When gfx starts first that approach works, but when the compute queue runs 
first we then try to preempt a compute queue which is waiting for the gfx draw 
to start.

Since we don't have CWSR for this compute queue this results in a lockup at the 
moment.

Regards,
Christian.

> 
> Regards
> Shaoyun.liu
> 
> -----Original Message-----
> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Alex 
> Deucher
> Sent: Friday, September 5, 2025 9:32 AM
> To: Christian König <ckoenig.leichtzumer...@gmail.com>
> Cc: Deucher, Alexander <alexander.deuc...@amd.com>; 
> amd-gfx@lists.freedesktop.org; timur.kris...@gmail.com
> Subject: Re: [PATCH 2/2] drm/amdgpu: reject gang submissions under SRIOV
> 
> On Fri, Sep 5, 2025 at 8:47 AM Christian König 
> <ckoenig.leichtzumer...@gmail.com> wrote:
>>
>> Gang submission means that the kernel driver guarantees that multiple
>> submissions are executed on the HW at the same time on different engines.
>>
>> Background is that those submissions then depend on each other and
>> each can't finish stand alone.
>>
>> SRIOV now uses world switch to preempt submissions on the engines to
>> allow sharing the HW resources between multiple VFs.
>>
>> The problem is now that the SRIOV world switch can't know about such
>> inter dependencies and will cause a timeout if it waits for a
>> partially running gang submission.
>>
>> To conclude SRIOV and gang submissions are fundamentally incompatible
>> at the moment. For now just disable them.
> 
> Are you sure about this?  Thinking about this more, most gang submissions are 
> between gfx and compute.  The entire GC block (gfx, compute, and sdma on 
> gfx10+) gets preempted on world switch so all of the active queues would be 
> preempted.  Everything gets resumed when the VF gets switched back.  VCN/JPEG 
> gets switched independently so that could be a problem if you have a gang 
> with VCN and GC, but I think all gangs within GC should in theory be ok.
> 
> Alex
> 
>>
>> Signed-off-by: Christian König <christian.koe...@amd.com>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 2ac9729e4c86..434a551365c7 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -286,7 +286,7 @@ static int amdgpu_cs_pass1(struct amdgpu_cs_parser *p,
>>                 }
>>         }
>>
>> -       if (!p->gang_size) {
>> +       if (!p->gang_size || (amdgpu_sriov_vf(p->adev) && p->gang_size
>> + > 1)) {
>>                 ret = -EINVAL;
>>                 goto free_all_kdata;
>>         }
>> --
>> 2.43.0
>>

Reply via email to