On Fri, Apr 22, 2022 at 3:54 PM Wong, Alice <[email protected]> wrote:
>
> [AMD Official Use Only]
>
> Hi Alex,
>
> The attached patch freed most of the allocated memory except for one 
> allocated by psp_tmr_init during psp_load_fw.
> Combination of the attached patch and calling psp_hw_fini when psp_hw_init 
> failed would fix the issue.
>
> As a proper fix, we can call psp_tmr_terminate in psp_load_fw when 
> psp_load_non_psp_fw failed. (attached patch)
> We can't move psp_tmr_init to sw_init because we need to load toc to get the 
> TMR size.
> Do you have any concerns with this approach?

That only covers failures in hw_init().  It doesn't handle resume().
Looks like all of the ta functions also potentially leak.  I'm working
on a cleanup to handle all of these.

Alex

>
> Alice
>
>
> -----Original Message-----
> From: Alex Deucher <[email protected]>
> Sent: April 21, 2022 1:31 AM
> To: Wong, Alice <[email protected]>
> Cc: amd-gfx list <[email protected]>
> Subject: Re: [PATCH] drm/amdgpu: clean up psp ip if hw_init failed v2
>
> On Wed, Apr 20, 2022 at 5:48 PM Alice Wong <[email protected]> wrote:
> >
> > If at any point psp_hw_init failed, psp_hw_fini would not be called
> > during unload due to ip_blocks[PSP].status.hw not being set to true.
> > This could cause a memory leak when the driver unloads.
> > As a rule of thumb, each IP block should cleanup themselves when their
> > hw_init fails. Only previously intialized blocks should be cleaned up
> > by the common framework.
> >
> > v1: Call IP blocks' respective hw_fini when hw_init failed from
> >     the common framework
> > v2: Call psp_hw_fini when psp_hw_init failed.
> >
> > Signed-off-by: Alice Wong <[email protected]>
>
> I don't think this is a good idea.  The hw programming in hw_fini makes no 
> sense if the driver never set it up in the first place if hw_init failed 
> before initializing the hw.  It would be better to just properly unwind the 
> relevant functions.  Presumably the problem you are seeing is the failure to 
> free the GPU memory allocated in fw_fini, depending on where it fails.  We 
> should just allocate the memory in sw_init; that is what we do in other IPs.  
> Does the attached patch fix the issue you are seeing?
>
> Alex
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 57
> > +++++++++++++------------
> >  1 file changed, 29 insertions(+), 28 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > index 5b9e48d44f5b..52b14efa848a 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
> > @@ -2537,6 +2537,34 @@ static int psp_load_fw(struct amdgpu_device *adev)
> >         return ret;
> >  }
> >
> > +static int psp_hw_fini(void *handle)
> > +{
> > +       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > +       struct psp_context *psp = &adev->psp;
> > +
> > +       if (psp->ta_fw) {
> > +               psp_ras_terminate(psp);
> > +               psp_securedisplay_terminate(psp);
> > +               psp_rap_terminate(psp);
> > +               psp_dtm_terminate(psp);
> > +               psp_hdcp_terminate(psp);
> > +       }
> > +
> > +       psp_asd_terminate(psp);
> > +
> > +       psp_tmr_terminate(psp);
> > +       psp_ring_destroy(psp, PSP_RING_TYPE__KM);
> > +
> > +       amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> > +                             &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> > +       amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> > +                             &psp->fence_buf_mc_addr, &psp->fence_buf);
> > +       amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> > +                             (void **)&psp->cmd_buf_mem);
> > +
> > +       return 0;
> > +}
> > +
> >  static int psp_hw_init(void *handle)
> >  {
> >         int ret;
> > @@ -2563,37 +2591,10 @@ static int psp_hw_init(void *handle)
> >  failed:
> >         adev->firmware.load_type = AMDGPU_FW_LOAD_DIRECT;
> >         mutex_unlock(&adev->firmware.mutex);
> > +       psp_hw_fini(handle);
> >         return -EINVAL;
> >  }
> >
> > -static int psp_hw_fini(void *handle)
> > -{
> > -       struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > -       struct psp_context *psp = &adev->psp;
> > -
> > -       if (psp->ta_fw) {
> > -               psp_ras_terminate(psp);
> > -               psp_securedisplay_terminate(psp);
> > -               psp_rap_terminate(psp);
> > -               psp_dtm_terminate(psp);
> > -               psp_hdcp_terminate(psp);
> > -       }
> > -
> > -       psp_asd_terminate(psp);
> > -
> > -       psp_tmr_terminate(psp);
> > -       psp_ring_destroy(psp, PSP_RING_TYPE__KM);
> > -
> > -       amdgpu_bo_free_kernel(&psp->fw_pri_bo,
> > -                             &psp->fw_pri_mc_addr, &psp->fw_pri_buf);
> > -       amdgpu_bo_free_kernel(&psp->fence_buf_bo,
> > -                             &psp->fence_buf_mc_addr, &psp->fence_buf);
> > -       amdgpu_bo_free_kernel(&psp->cmd_buf_bo, &psp->cmd_buf_mc_addr,
> > -                             (void **)&psp->cmd_buf_mem);
> > -
> > -       return 0;
> > -}
> > -
> >  static int psp_suspend(void *handle)
> >  {
> >         int ret;
> > --
> > 2.25.1
> >

Reply via email to