[Public]

Hi Alex and Lijo,

Revised the patch based on your  comments. Please review [PATCH] drm/amdgpu: 
Put GPU CG/PG ungate in device_fini_hw and device_halt

Best Regards,
Yifan

-----Original Message-----
From: Lazar, Lijo <[email protected]>
Sent: Saturday, January 31, 2026 2:05 PM
To: Alex Deucher <[email protected]>
Cc: Deucher, Alexander <[email protected]>; 
[email protected]; Zhang, Yifan <[email protected]>
Subject: Re: [PATCH 01/12] drm/amdgpu/gfx9: handle gfxoff in interrupt set 
functions



On 30-Jan-26 9:19 PM, Alex Deucher wrote:
> On Fri, Jan 30, 2026 at 10:05 AM Lazar, Lijo <[email protected]> wrote:
>>
>>
>>
>> On 30-Jan-26 8:18 PM, Alex Deucher wrote:
>>> On Tue, Jan 27, 2026 at 12:02 AM Lazar, Lijo <[email protected]> wrote:
>>>>
>>>>
>>>>
>>>> On 27-Jan-26 1:37 AM, Alex Deucher wrote:
>>>>> Need to make sure gfxoff is disallowed when we touch GC registers
>>>>> over MMIO.
>>>>>
>>>>
>>>> I think interrupt enable/disable sequence is only supposed to be
>>>> done under ip power/clock ungate sequence like in hw
>>>> init/resume/suspend sequences. The fix probably should be in the
>>>> higher level sequence which doesn't take care of that.
>>>
>>> In that case, Yifan's original patch is probably fine as is.
>>> Someone should still double check all of the call paths though.
>>>
>>
>> Original one is also not correct. For example, if this is happening
>> after reset re initialization, only that sequence needs to be
>> modified to keep the affected IPs ungated during reinit.
>
> It also gets called in amdgpu_device_fini_hw().
>

Checked now. What I meant is - we should call ip cg/pg ungate insid
amdgpu_device_fini_hw() to protect all other accesses within that sequence and 
not restricted to irq_disable/enable all. Now that happens a while later in 
amdgpu_device_ip_fini_early.

Basically, the high level sequence needs IP pg ungate protection.

Thanks,
Lijo

> Alex
>
>>
>> Thanks,
>> Lijo
>>
>>> Alex
>>>
>>>>
>>>> Thanks,
>>>> Lijo
>>>>
>>>>> Cc: Yifan Zhang <[email protected]>
>>>>> Signed-off-by: Alex Deucher <[email protected]>
>>>>> ---
>>>>>     drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 10 ++++++++++
>>>>>     1 file changed, 10 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>> index 36f0300a21bfa..05178ee8e0e3a 100644
>>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>>>> @@ -6056,6 +6056,7 @@ static int gfx_v9_0_set_priv_reg_fault_state(struct 
>>>>> amdgpu_device *adev,
>>>>>         u32 cp_int_cntl_reg, cp_int_cntl;
>>>>>         int i, j;
>>>>>
>>>>> +     amdgpu_gfx_off_ctrl(adev, false);
>>>>>         switch (state) {
>>>>>         case AMDGPU_IRQ_STATE_DISABLE:
>>>>>         case AMDGPU_IRQ_STATE_ENABLE:
>>>>> @@ -6080,6 +6081,7 @@ static int gfx_v9_0_set_priv_reg_fault_state(struct 
>>>>> amdgpu_device *adev,
>>>>>         default:
>>>>>                 break;
>>>>>         }
>>>>> +     amdgpu_gfx_off_ctrl(adev, true);
>>>>>
>>>>>         return 0;
>>>>>     }
>>>>> @@ -6092,6 +6094,7 @@ static int gfx_v9_0_set_bad_op_fault_state(struct 
>>>>> amdgpu_device *adev,
>>>>>         u32 cp_int_cntl_reg, cp_int_cntl;
>>>>>         int i, j;
>>>>>
>>>>> +     amdgpu_gfx_off_ctrl(adev, false);
>>>>>         switch (state) {
>>>>>         case AMDGPU_IRQ_STATE_DISABLE:
>>>>>         case AMDGPU_IRQ_STATE_ENABLE:
>>>>> @@ -6116,6 +6119,7 @@ static int gfx_v9_0_set_bad_op_fault_state(struct 
>>>>> amdgpu_device *adev,
>>>>>         default:
>>>>>                 break;
>>>>>         }
>>>>> +     amdgpu_gfx_off_ctrl(adev, true);
>>>>>
>>>>>         return 0;
>>>>>     }
>>>>> @@ -6125,6 +6129,7 @@ static int 
>>>>> gfx_v9_0_set_priv_inst_fault_state(struct amdgpu_device *adev,
>>>>>                                               unsigned type,
>>>>>                                               enum amdgpu_interrupt_state 
>>>>> state)
>>>>>     {
>>>>> +     amdgpu_gfx_off_ctrl(adev, false);
>>>>>         switch (state) {
>>>>>         case AMDGPU_IRQ_STATE_DISABLE:
>>>>>         case AMDGPU_IRQ_STATE_ENABLE:
>>>>> @@ -6135,6 +6140,7 @@ static int 
>>>>> gfx_v9_0_set_priv_inst_fault_state(struct amdgpu_device *adev,
>>>>>         default:
>>>>>                 break;
>>>>>         }
>>>>> +     amdgpu_gfx_off_ctrl(adev, true);
>>>>>
>>>>>         return 0;
>>>>>     }
>>>>> @@ -6152,6 +6158,7 @@ static int gfx_v9_0_set_cp_ecc_error_state(struct 
>>>>> amdgpu_device *adev,
>>>>>                                               unsigned type,
>>>>>                                               enum amdgpu_interrupt_state 
>>>>> state)
>>>>>     {
>>>>> +     amdgpu_gfx_off_ctrl(adev, false);
>>>>>         switch (state) {
>>>>>         case AMDGPU_IRQ_STATE_DISABLE:
>>>>>                 WREG32_FIELD15(GC, 0, CP_INT_CNTL_RING0, @@
>>>>> -6173,6 +6180,7 @@ static int gfx_v9_0_set_cp_ecc_error_state(struct 
>>>>> amdgpu_device *adev,
>>>>>         default:
>>>>>                 break;
>>>>>         }
>>>>> +     amdgpu_gfx_off_ctrl(adev, true);
>>>>>
>>>>>         return 0;
>>>>>     }
>>>>> @@ -6183,6 +6191,7 @@ static int gfx_v9_0_set_eop_interrupt_state(struct 
>>>>> amdgpu_device *adev,
>>>>>                                             unsigned type,
>>>>>                                             enum amdgpu_interrupt_state 
>>>>> state)
>>>>>     {
>>>>> +     amdgpu_gfx_off_ctrl(adev, false);
>>>>>         switch (type) {
>>>>>         case AMDGPU_CP_IRQ_GFX_ME0_PIPE0_EOP:
>>>>>                 gfx_v9_0_set_gfx_eop_interrupt_state(adev, state);
>>>>> @@ -6214,6 +6223,7 @@ static int gfx_v9_0_set_eop_interrupt_state(struct 
>>>>> amdgpu_device *adev,
>>>>>         default:
>>>>>                 break;
>>>>>         }
>>>>> +     amdgpu_gfx_off_ctrl(adev, true);
>>>>>         return 0;
>>>>>     }
>>>>>
>>>>
>>

Reply via email to