Thank you Christian. Maybe in the future we can make the invalidation ack to be interrupt based instead of polling.
Regards, Oak -----Original Message----- From: Koenig, Christian <[email protected]> Sent: Wednesday, November 20, 2019 10:14 AM To: Zeng, Oak <[email protected]>; Liu, Monk <[email protected]>; Zhu, Changfeng <[email protected]>; Xiao, Jack <[email protected]>; Zhou1, Tao <[email protected]>; Huang, Ray <[email protected]>; Huang, Shimmer <[email protected]>; [email protected] Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in amdgpu_virt Hi Oak, > [Oak] I am not familiar about the power gating sequence but from first > glance, should the power gating sequence make sure that HW is ready (idle) > for power gating before put the system to power gating? The problem is that the hardware is actually idle when gated. See what happens is the following: 1. Ring A sends an invalidate command to VM invalidation engine X. 2. VM invalidation engine X wakes up and is ungated because it now has work. 3. VM invalidation engine X finishes the invalidation and goes back to be gated again. 4. Now ring A polls for the invalidation on engine X to complete, but since it got back to be gated again it has forgotten that we have finished that invalidation. BAM! Ring A will poll forever. Regards, Christian. Am 20.11.19 um 16:04 schrieb Zeng, Oak: > See an inline comment > > Regards, > Oak > > -----Original Message----- > From: amd-gfx <[email protected]> On Behalf Of > Christian König > Sent: Wednesday, November 20, 2019 8:21 AM > To: Liu, Monk <[email protected]>; Zhu, Changfeng > <[email protected]>; Xiao, Jack <[email protected]>; Zhou1, Tao > <[email protected]>; Huang, Ray <[email protected]>; Huang, Shimmer > <[email protected]>; [email protected] > Subject: Re: 答复: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore > workaround in amdgpu_virt > > Hi Monk, > > this is a fix for power gating the MMHUB. > > Basic problem is that the MMHUB can power gate while an invalidation > is in progress [Oak] I am not familiar about the power gating sequence but > from first glance, should the power gating sequence make sure that HW is > ready (idle) for power gating before put the system to power gating? E.g., > before we put the system to power gating, should we enquiry each HW blocks to > see whether the HW is idle? If not (like the case you mentioned some > invalidation activities is still ongoing) the power gating condition is not > mature and we should we wait. Or if the power gating is trigger/initiated by > HW (I am not sure), HW should guarantee it is idle? > > which looses all bits in the ACK register and so deadlocks the engine > waiting for the invalidation to finish. > > This bug is hit immediately when we enable power gating of the MMHUB. > > Regards, > Christian. > > Am 20.11.19 um 14:18 schrieb Liu, Monk: >> Hi Changfeng >> >> Firs of all, there is no power-gating off circle involved in AMDGPU >> SRIOV, since we don't allow VF/VM do such things so I do feel strange >> why you post something like this Especially on VEGA10 serials which >> looks doesn't have any issue on those gpu_flush part >> >> Here is my questions for you: >> 1) Can you point me what issue had you been experienced ? and how to >> repro the bug >> 2) if you do hit some issues, did you verified that your patch can fix it ? >> >> besides >> >> /Monk >> >> -----邮件原件----- >> 发件人: amd-gfx <[email protected]> 代表 Changfeng.Zhu >> 发送时间: 2019年11月20日 17:14 >> 收件人: Koenig, Christian <[email protected]>; Xiao, Jack >> <[email protected]>; Zhou1, Tao <[email protected]>; Huang, Ray >> <[email protected]>; Huang, Shimmer <[email protected]>; >> [email protected] >> 抄送: Zhu, Changfeng <[email protected]> >> 主题: [PATCH 1/2] drm/amdgpu: invalidate mmhub semphore workaround in >> amdgpu_virt >> >> From: changzhu <[email protected]> >> >> It may lose gpuvm invalidate acknowldege state across power-gating off >> cycle. To avoid this issue in virt invalidation, add semaphore acquire >> before invalidation and semaphore release after invalidation. >> >> Change-Id: Ie98304e475166b53eed033462d76423b6b0fc25b >> Signed-off-by: changzhu <[email protected]> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 26 ++++++++++++++++++++++-- >> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 3 ++- >> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 3 ++- >> 3 files changed, 28 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >> index f04eb1a64271..70ffaf91cd12 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >> @@ -135,7 +135,8 @@ void amdgpu_virt_kiq_wreg(struct amdgpu_device >> *adev, uint32_t reg, uint32_t v) >> >> void amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev, >> uint32_t reg0, uint32_t reg1, >> - uint32_t ref, uint32_t mask) >> + uint32_t ref, uint32_t mask, >> + uint32_t sem) >> { >> struct amdgpu_kiq *kiq = &adev->gfx.kiq; >> struct amdgpu_ring *ring = &kiq->ring; @@ -144,9 +145,30 @@ void >> amdgpu_virt_kiq_reg_write_reg_wait(struct amdgpu_device *adev, >> uint32_t seq; >> >> spin_lock_irqsave(&kiq->ring_lock, flags); >> - amdgpu_ring_alloc(ring, 32); >> + amdgpu_ring_alloc(ring, 60); >> + >> + /* >> + * It may lose gpuvm invalidate acknowldege state across power-gating >> + * off cycle, add semaphore acquire before invalidation and semaphore >> + * release after invalidation to avoid entering power gated state >> + * to WA the Issue >> + */ >> + >> + /* a read return value of 1 means semaphore acuqire */ >> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 || >> + ring->funcs->vmhub == AMDGPU_MMHUB_1) >> + amdgpu_ring_emit_reg_wait(ring, sem, 0x1, 0x1); >> + >> amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1, >> ref, mask); >> + /* >> + * add semaphore release after invalidation, >> + * write with 0 means semaphore release >> + */ >> + if (ring->funcs->vmhub == AMDGPU_MMHUB_0 || >> + ring->funcs->vmhub == AMDGPU_MMHUB_1) >> + amdgpu_ring_emit_wreg(ring, sem, 0); >> + >> amdgpu_fence_emit_polling(ring, &seq); >> amdgpu_ring_commit(ring); >> spin_unlock_irqrestore(&kiq->ring_lock, flags); diff --git >> a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >> index b0b2bdc750df..bda6a2f37dc0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h >> @@ -295,7 +295,8 @@ uint32_t amdgpu_virt_kiq_rreg(struct amdgpu_device >> *adev, uint32_t reg); void amdgpu_virt_kiq_wreg(struct amdgpu_device *adev, >> uint32_t reg, uint32_t v); void amdgpu_virt_kiq_reg_write_reg_wait(struct >> amdgpu_device *adev, >> uint32_t reg0, uint32_t rreg1, >> - uint32_t ref, uint32_t mask); >> + uint32_t ref, uint32_t mask, >> + uint32_t sem); >> int amdgpu_virt_request_full_gpu(struct amdgpu_device *adev, bool >> init); int amdgpu_virt_release_full_gpu(struct amdgpu_device *adev, >> bool init); int amdgpu_virt_reset_gpu(struct amdgpu_device *adev); >> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >> index f25cd97ba5f2..1ae59af7836a 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >> @@ -448,9 +448,10 @@ static void gmc_v9_0_flush_gpu_tlb(struct amdgpu_device >> *adev, uint32_t vmid, >> !adev->in_gpu_reset) { >> uint32_t req = hub->vm_inv_eng0_req + eng; >> uint32_t ack = hub->vm_inv_eng0_ack + eng; >> + uint32_t sem = hub->vm_inv_eng0_sem + eng; >> >> amdgpu_virt_kiq_reg_write_reg_wait(adev, req, ack, tmp, >> - 1 << vmid); >> + 1 << vmid, sem); >> return; >> } >> >> -- >> 2.17.1 >> >> _______________________________________________ >> amd-gfx mailing list >> [email protected] >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > [email protected] > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/amd-gfx
