On 24.09.25 01:00, [email protected] wrote:
> On Tue, 2025-09-23 at 15:10 +0200, Christian König wrote:
>> The Constant Engine found on gfx6-gfx10 HW has been a notorious
>> source of
>> problems.
>>
>> RADV never used it in the first place, radeonsi only used it for a
>> few
>> releases around 2017 for gfx6-gfx9 before dropping support for it as
>> well.
>>
>> While investigating another problem I just recently found that
>> submitting
>> to the CE seems to be completely broken on gfx9 for quite a while.
>>
>> Since nobody complained about that problem it most likely means that
>> nobody is using any of the affected radeonsi versions on current
>> Linux
>> kernels any more.
>>
>> So to potentially phase out the support for the CE and eliminate
>> another
>> source of problems block submitting CE IBs unless it is enabled again
>> using a debug flag.
>>
>> Signed-off-by: Christian König <[email protected]>
> 
> Acked-by: Timur Kristóf <[email protected]>
> 
> Hi Christian,
> 
> Would you be open to receiving a patch to stop emitting the CE related
> workarounds when the CE is not enabled?
> 
> Alternatively, could we stop emitting them altogether now that the CE
> is disabled by default?

Not yet. I want to push that upstream, wait for quite a while and when nobody 
complains just completely remove the CE support including all the extra 
overhead we currently do in the submission path for it.

> Also, should the new debug flag be documented?

Where should we put that? I already noted how to enable it again in the 
ratelimited error message printed when you try to use the CE.

Regards,
Christian.

> 
> Thanks & best regards,
> Timur
> 
> 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu.h     | 1 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  | 6 ++++++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 +++++++-
>>  3 files changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> index 2a0df4cabb99..6f5b4a0e0a34 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>> @@ -1290,6 +1290,7 @@ struct amdgpu_device {
>>      bool                           
>> debug_disable_gpu_ring_reset;
>>      bool                            debug_vm_userptr;
>>      bool                            debug_disable_ce_logs;
>> +    bool                            debug_enable_ce_cs;
>>  
>>      /* Protection for the following isolation structure */
>>      struct mutex                    enforce_isolation_mutex;
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 744e6ff69814..322890e2c899 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -364,6 +364,12 @@ static int amdgpu_cs_p2_ib(struct
>> amdgpu_cs_parser *p,
>>      if (p->uf_bo && ring->funcs->no_user_fence)
>>              return -EINVAL;
>>  
>> +    if (!p->adev->debug_enable_ce_cs &&
>> +        chunk_ib->flags & AMDGPU_IB_FLAG_CE) {
>> +            dev_err_ratelimited(p->adev->dev, "CE CS is blocked,
>> use debug=0x400 to override\n");
>> +            return -EINVAL;
>> +    }
>> +
>>      if (chunk_ib->ip_type == AMDGPU_HW_IP_GFX &&
>>          chunk_ib->flags & AMDGPU_IB_FLAG_PREEMPT) {
>>              if (chunk_ib->flags & AMDGPU_IB_FLAG_CE)
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> index ece251cbe8c3..3b3fc734c0f8 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
>> @@ -144,7 +144,8 @@ enum AMDGPU_DEBUG_MASK {
>>      AMDGPU_DEBUG_DISABLE_GPU_RING_RESET = BIT(6),
>>      AMDGPU_DEBUG_SMU_POOL = BIT(7),
>>      AMDGPU_DEBUG_VM_USERPTR = BIT(8),
>> -    AMDGPU_DEBUG_DISABLE_RAS_CE_LOG = BIT(9)
>> +    AMDGPU_DEBUG_DISABLE_RAS_CE_LOG = BIT(9),
>> +    AMDGPU_DEBUG_ENABLE_CE_CS = BIT(10)
>>  };
>>  
>>  unsigned int amdgpu_vram_limit = UINT_MAX;
>> @@ -2289,6 +2290,11 @@ static void amdgpu_init_debug_options(struct
>> amdgpu_device *adev)
>>              pr_info("debug: disable kernel logs of correctable
>> errors\n");
>>              adev->debug_disable_ce_logs = true;
>>      }
>> +
>> +    if (amdgpu_debug_mask & AMDGPU_DEBUG_ENABLE_CE_CS) {
>> +            pr_info("debug: allowing command submission to CE
>> engine\n");
>> +            adev->debug_enable_ce_cs = true;
>> +    }
>>  }
>>  
>>  static unsigned long amdgpu_fix_asic_type(struct pci_dev *pdev,
>> unsigned long flags)

Reply via email to