On Mon, Oct 9, 2023 at 4:41 AM Christian König
<ckoenig.leichtzumer...@gmail.com> wrote:
>
> Am 06.10.23 um 16:24 schrieb Alex Deucher:
> > On Fri, Oct 6, 2023 at 4:32 AM Christian König
> > <ckoenig.leichtzumer...@gmail.com> wrote:
> >> Am 06.10.23 um 07:21 schrieb Lijo Lazar:
> >>> Add sysfs attribute to read power management log. A snapshot is
> >>> captured to the buffer when the attribute is read.
> >>>
> >>> Signed-off-by: Lijo Lazar <lijo.la...@amd.com>
> >>> ---
> >>>
> >>> v2: Pass PAGE_SIZE as the max size of input buffer
> >>>
> >>>    drivers/gpu/drm/amd/pm/amdgpu_pm.c | 40 ++++++++++++++++++++++++++++++
> >>>    1 file changed, 40 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
> >>> b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> >>> index 4c65a2fac028..5a1d21c52672 100644
> >>> --- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> >>> +++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
> >>> @@ -1794,6 +1794,44 @@ static ssize_t amdgpu_set_apu_thermal_cap(struct 
> >>> device *dev,
> >>>        return count;
> >>>    }
> >>>
> >>> +static int amdgpu_pmlog_attr_update(struct amdgpu_device *adev,
> >>> +                                 struct amdgpu_device_attr *attr,
> >>> +                                 uint32_t mask,
> >>> +                                 enum amdgpu_device_attr_states *states)
> >>> +{
> >>> +     if (amdgpu_dpm_get_pm_log(adev, NULL, 0) == -EOPNOTSUPP)
> >>> +             *states = ATTR_STATE_UNSUPPORTED;
> >>> +
> >>> +     return 0;
> >>> +}
> >>> +
> >>> +static ssize_t amdgpu_get_pmlog(struct device *dev,
> >>> +                             struct device_attribute *attr, char *buf)
> >>> +{
> >>> +     struct drm_device *ddev = dev_get_drvdata(dev);
> >>> +     struct amdgpu_device *adev = drm_to_adev(ddev);
> >>> +     ssize_t size = 0;
> >>> +     int ret;
> >>> +
> >>> +     if (amdgpu_in_reset(adev))
> >>> +             return -EPERM;
> >> Please stop using amdgpu_in_reset() for stuff like that altogether.
> >>
> >> When this is reset critical it should grab the reset rw semaphore. If it
> >> isn't than that check isn't necessary.
> > All of the power related sysfs files have this check.  It was
> > originally added because users have processes running which poll
> > various hwmon files at regular intervals and since SMU also handles
> > reset, we don't want to get a metrics table request while a reset is
> > happening otherwise the SMU gets confused.
>
> Then this approach is completely broken. Nothing prevents the reset from
> starting right after doing the check.
>
> If we need exclusive access to the SMU then we should just grab a lock.

Right, but the entire file should be fixed.  It's sort of orthogonal
to this patch.

Alex

>
> Christian.
>
> >
> > Alex
> >
> >> Regards,
> >> Christian.
> >>
> >>> +     if (adev->in_suspend && !adev->in_runpm)
> >>> +             return -EPERM;
> >>> +
> >>> +     ret = pm_runtime_get_sync(ddev->dev);
> >>> +     if (ret < 0) {
> >>> +             pm_runtime_put_autosuspend(ddev->dev);
> >>> +             return ret;
> >>> +     }
> >>> +
> >>> +     size = amdgpu_dpm_get_pm_log(adev, buf, PAGE_SIZE);
> >>> +
> >>> +     pm_runtime_mark_last_busy(ddev->dev);
> >>> +     pm_runtime_put_autosuspend(ddev->dev);
> >>> +
> >>> +     return size;
> >>> +}
> >>> +
> >>>    /**
> >>>     * DOC: gpu_metrics
> >>>     *
> >>> @@ -2091,6 +2129,8 @@ static struct amdgpu_device_attr 
> >>> amdgpu_device_attrs[] = {
> >>>        AMDGPU_DEVICE_ATTR_RW(smartshift_bias,                          
> >>> ATTR_FLAG_BASIC,
> >>>                              .attr_update = ss_bias_attr_update),
> >>>        AMDGPU_DEVICE_ATTR_RW(xgmi_plpd_policy,                         
> >>> ATTR_FLAG_BASIC),
> >>> +     AMDGPU_DEVICE_ATTR_RO(pmlog,                                    
> >>> ATTR_FLAG_BASIC,
> >>> +                           .attr_update = amdgpu_pmlog_attr_update),
> >>>    };
> >>>
> >>>    static int default_attr_update(struct amdgpu_device *adev, struct 
> >>> amdgpu_device_attr *attr,
>

Reply via email to