Hi Lijo,

On 2/29/2024 3:33 PM, Lazar, Lijo wrote:
> 
> 
> On 2/29/2024 11:49 AM, Ma Jun wrote:
>> Check return value of amdgpu_device_baco_enter/exit and print
>> warning message because these errors may cause runtime resume failure
>>
>> Signed-off-by: Ma Jun <[email protected]>
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 ++++++++++++++++------
>>  1 file changed, 22 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> index e68bd6f8a6a4..4928b588cd12 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>> @@ -6000,15 +6000,24 @@ int amdgpu_device_baco_enter(struct drm_device *dev)
>>  {
>>      struct amdgpu_device *adev = drm_to_adev(dev);
>>      struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>> +    int ret = 0;
>>  
>> -    if (!amdgpu_device_supports_baco(dev))
>> -            return -ENOTSUPP;
>> +    if (!amdgpu_device_supports_baco(dev)) {
>> +            ret = -ENOTSUPP;
>> +            goto baco_error;
>> +    }
>>  
>>      if (ras && adev->ras_enabled &&
>>          adev->nbio.funcs->enable_doorbell_interrupt)
>>              adev->nbio.funcs->enable_doorbell_interrupt(adev, false);
>>  
>> -    return amdgpu_dpm_baco_enter(adev);
>> +    ret = amdgpu_dpm_baco_enter(adev);
>> +
>> +baco_error:
>> +    if (ret)
>> +            dev_warn(adev->dev, "warning: device fails to enter baco. 
>> ret=%d\n", ret);
>> +
> 
> This doesn't look like a real case, moreover the warning message is

In fact this is a case that actually happened.

When amdgpu_device_supports_baco returns with error because of some reasons,
device will enter runtime suspend without calling the  
amdgpu_device_baco_enter()
and without any warning message being printed. Then, device is usually fails
to resume.
So, I add this message as a warning to help us find the real reason.

> misleading. If the device doesn't support baco, driver is not supposed
> to call it for runpm purpose -
> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L2664
> 
I changed this code in another patch.
https://lists.freedesktop.org/archives/amd-gfx/2024-February/104929.html

Regards,
Ma Jun

> Thanks,
> Lijo
> 
>> +    return ret;
>>  }
>>  
>>  int amdgpu_device_baco_exit(struct drm_device *dev)
>> @@ -6017,12 +6026,14 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
>>      struct amdgpu_ras *ras = amdgpu_ras_get_context(adev);
>>      int ret = 0;
>>  
>> -    if (!amdgpu_device_supports_baco(dev))
>> -            return -ENOTSUPP;
>> +    if (!amdgpu_device_supports_baco(dev)) {
>> +            ret = -ENOTSUPP;
>> +            goto baco_error;
>> +    }
>>  
>>      ret = amdgpu_dpm_baco_exit(adev);
>>      if (ret)
>> -            return ret;
>> +            goto baco_error;
>>  
>>      if (ras && adev->ras_enabled &&
>>          adev->nbio.funcs->enable_doorbell_interrupt)
>> @@ -6032,7 +6043,11 @@ int amdgpu_device_baco_exit(struct drm_device *dev)
>>          adev->nbio.funcs->clear_doorbell_interrupt)
>>              adev->nbio.funcs->clear_doorbell_interrupt(adev);
>>  
>> -    return 0;
>> +baco_error:
>> +    if (ret)
>> +            dev_warn(adev->dev, "warning: device fails to exit from baco. 
>> ret=%d\n", ret);
>> +
>> +    return ret;
>>  }
>>  
>>  /**

Reply via email to