Thank you Christian. Maybe in the future we can make the invalidation ack to be 
interrupt based instead of polling.

Regards,
Oak

-----Original Message-----
From: Koenig, Christian <[email protected]> 
Sent: Wednesday, November 20, 2019 10:14 AM
To: Zeng, Oak <[email protected]>; Liu, Monk <[email protected]>; Zhu, Changfeng 
<[email protected]>; Xiao, Jack <[email protected]>; Zhou1, Tao 
<[email protected]>; Huang, Ray <[email protected]>; Huang, Shimmer 
<[email protected]>; [email protected]
Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround 
in amdgpu_virt

Hi Oak,

> [Oak] I am not familiar about the power gating sequence but from first 
> glance, should the power gating sequence make sure that HW is ready (idle) 
> for power gating before put the system to power gating?
The problem is that the hardware is actually idle when gated.

See what happens is the following:

1. Ring A sends an invalidate command to VM invalidation engine X.

2. VM invalidation engine X wakes up and is ungated because it now has work.

3. VM invalidation engine X finishes the invalidation and goes back to be gated 
again.

4. Now ring A polls for the invalidation on engine X to complete, but since it 
got back to be gated again it has forgotten that we have finished that 
invalidation. BAM! Ring A will poll forever.

Regards,
Christian.

Am 20.11.19 um 16:04 schrieb Zeng, Oak:
> See an inline comment
>
> Regards,
> Oak
>
> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of 
> Christian König
> Sent: Wednesday, November 20, 2019 8:21 AM
> To: Liu, Monk <[email protected]>; Zhu, Changfeng 
> <[email protected]>; Xiao, Jack <[email protected]>; Zhou1, Tao 
> <[email protected]>; Huang, Ray <[email protected]>; Huang, Shimmer 
> <[email protected]>; [email protected]
> Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore 
> workaround in amdgpu_virt
>
> Hi Monk,
>
> this is a fix for power gating the MMHUB.
>
> Basic problem is that the MMHUB can power gate while an invalidation 
> is in progress [Oak] I am not familiar about the power gating sequence but 
> from first glance, should the power gating sequence make sure that HW is 
> ready (idle) for power gating before put the system to power gating? E.g., 
> before we put the system to power gating, should we enquiry each HW blocks to 
> see whether the HW is idle? If not (like the case you mentioned some 
> invalidation activities is still ongoing) the power gating condition is not 
> mature and we should we wait. Or if the power gating is trigger/initiated by 
> HW (I am not sure), HW should guarantee it is idle?
>
>   which looses all bits in the ACK register and so deadlocks the engine 
> waiting for the invalidation to finish.
>
> This bug is hit immediately when we enable power gating of the MMHUB.
>
> Regards,
> Christian.
>
> Am 20.11.19 um 14:18 schrieb Liu, Monk:
>> Hi Changfeng
>>
>> Firs of all, there is no power-gating off circle involved in AMDGPU 
>> SRIOV, since we don't allow VF/VM do such things so I do feel strange 
>> why you post something like this Especially on VEGA10 serials which 
>> looks doesn't have any issue on those gpu_flush part
>>
>> Here is my questions for you:
>> 1) Can you point me what issue had you been experienced ? and how to 
>> repro the bug
>> 2) if you do hit some issues, did you verified that your patch can fix it ?
>>
>> besides
>>
>> /Monk
>>
>> -----邮件原件-----
>> 发件人: amd-gfx <[email protected]> 代表 Changfeng.Zhu
>> 发送时间: 2019年11月20日 17:14
>> 收件人: Koenig, Christian <[email protected]>; Xiao, Jack 
>> <[email protected]>; Zhou1, Tao <[email protected]>; Huang, Ray 
>> <[email protected]>; Huang, Shimmer <[email protected]>; 
>> [email protected]
>> 抄送: Zhu, Changfeng <[email protected]>
>> 主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in 
>> amdgpu_virt
>>
>> From: changzhu <[email protected]>
>>
>> It may lose gpuvm invalidate acknowldege state across power-gating off 
>> cycle. To avoid this issue in virt invalidation, add semaphore acquire 
>> before invalidation and semaphore release after invalidation.
>>
>> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b
>> Signed-off-by: changzhu <[email protected]>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++++++++++++++++++++++--  
>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h |  3 ++-
>>    drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c    |  3 ++-
>>    3 files changed, 28 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> index f04eb1a64271..70ffaf91cd12 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device 
>> *adev, uint32_t reg, uint32_t v)
>>    
>>    void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>                                      uint32_t reg0, uint32_t reg1,
>> -                                    uint32_t ref, uint32_t mask)
>> +                                    uint32_t ref, uint32_t mask,
>> +                                    uint32_t sem)
>>    {
>>      struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>      struct amdgpu_ring *ring = &kiq->ring; @@ -144,9 +145,30 @@ void 
>> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev,
>>      uint32_t seq;
>>    
>>      spin_lock_irqsave(&kiq->ring_lock, flags);
>> -    amdgpu_ring_alloc(ring, 32);
>> +    amdgpu_ring_alloc(ring, 60);
>> +
>> +    /*
>> +     * It may lose gpuvm invalidate acknowldege state across power-gating
>> +     * off cycle, add semaphore acquire before invalidation and semaphore
>> +     * release after invalidation to avoid entering power gated state
>> +     * to WA the Issue
>> +     */
>> +
>> +    /* a read return value of 1 means semaphore acuqire */
>> +    if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
>> +        ring->funcs->vmhub == AMDGPU_MMHUB_1)
>> +    amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1);
>> +
>>      amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>>                                          ref, mask);
>> +    /*
>> +     * add semaphore release after invalidation,
>> +     * write with 0 means semaphore release
>> +     */
>> +    if (ring->funcs->vmhub == AMDGPU_MMHUB_0 ||
>> +        ring->funcs->vmhub == AMDGPU_MMHUB_1)
>> +    amdgpu_ring_emit_wreg(ring, sem, 0);
>> +
>>      amdgpu_fence_emit_polling(ring, &seq);
>>      amdgpu_ring_commit(ring);
>>      spin_unlock_irqrestore(&kiq->ring_lock, flags); diff --git 
>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> index b0b2bdc750df..bda6a2f37dc0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h
>> @@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device 
>> *adev, uint32_t reg);  void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, 
>> uint32_t reg, uint32_t v);  void amdgpu_virt_kiq_reg_write_reg_wait(struct 
>> amdgpu_device *adev,
>>                                      uint32_t reg0, uint32_t rreg1,
>> -                                    uint32_t ref, uint32_t mask);
>> +                                    uint32_t ref, uint32_t mask,
>> +                                    uint32_t sem);
>>    int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool 
>> init);  int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, 
>> bool init);  int amdgpu_virt_reset_gpu(struct amdgpu_device *adev); 
>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> index f25cd97ba5f2..1ae59af7836a 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>> @@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device 
>> *adev, uint32_t vmid,
>>                      !adev->in_gpu_reset) {
>>              uint32_t req = hub->vm_inv_eng0_req + eng;
>>              uint32_t ack = hub->vm_inv_eng0_ack + eng;
>> +            uint32_t sem = hub->vm_inv_eng0_sem + eng;
>>    
>>              amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp,
>> -                            1 << vmid);
>> +                                               1 << vmid, sem);
>>              return;
>>      }
>>    
>> --
>> 2.17.1
>>
>> _______________________________________________
>> amd-gfx mailing list
>> [email protected]
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> [email protected]
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to