It's in amdgpu_pci_resume.

Andrey, shall we modify the code accordingly in amdgpu_pci_resume as well? 
Otherwise, an unset/unlock leak will happen when pci_channel_state != 
pci_channel_io_frozen.

Regards,
Guchun

-----Original Message-----
From: amd-gfx <[email protected]> On Behalf Of Christian 
König
Sent: Thursday, April 14, 2022 2:40 PM
To: Grodzovsky, Andrey <[email protected]>; 
[email protected]
Cc: Antonovitch, Anatoli <[email protected]>
Subject: Re: [PATCH] drm/amdgpu: Move reset domain locking in DPC handler



Am 13.04.22 um 21:31 schrieb Andrey Grodzovsky:
> Lock reset domain unconditionally because on resume we unlock it 
> unconditionally.
> This solved mutex deadlock when handling both FATAL and non FATAL PCI 
> errors one after another.
>
> Signed-off-by: Andrey Grodzovsky <[email protected]>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++++++-------
>   1 file changed, 7 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1cc488a767d8..c65f25e3a0fc 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -5531,18 +5531,18 @@ pci_ers_result_t 
> amdgpu_pci_error_detected(struct pci_dev *pdev, pci_channel_sta
>   
>       adev->pci_channel_state = state;
>   
> +     /*
> +      * Locking adev->reset_domain->sem will prevent any external access
> +      * to GPU during PCI error recovery
> +      */
> +     amdgpu_device_lock_reset_domain(adev->reset_domain);
> +     amdgpu_device_set_mp1_state(adev);
> +
>       switch (state) {
>       case pci_channel_io_normal:
>               return PCI_ERS_RESULT_CAN_RECOVER;

BTW: Where are we unlocking that again?

>       /* Fatal error, prepare for slot reset */
>       case pci_channel_io_frozen:
> -             /*
> -              * Locking adev->reset_domain->sem will prevent any external 
> access
> -              * to GPU during PCI error recovery
> -              */
> -             amdgpu_device_lock_reset_domain(adev->reset_domain);
> -             amdgpu_device_set_mp1_state(adev);
> -
>               /*
>                * Block any work scheduling as we do for regular GPU reset
>                * for the duration of the recovery

Reply via email to