RE: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault

Zeng, Oak Sun, 17 Nov 2019 19:42:35 -0800

Hi Philip/Alex,

I found I can't understand this patch without more details in the commit 
message. Is this preparation work for the page migration? Why setting the 
translation further bit can force a no-retry-fault? Won't setting this bit 
cause UTCL2 treat the PTE as a PDE and continue to walk to the PTE that this 
PTE pointing to? But from the patch below, the next level of PTE (pseudoly the 
5th level page table) is not set.


Why this setting will cause the wave to enter trap handler? As I understand it, 
wave enters trap handler (saving context) either it encounter a s_trap 
instruction, or CP can initialize suspension to preempt the running waves, or 
context switch can be triggered by certain counter (monitored by SPI). 

Further question will be, what is the planned page migration process? Halt 
wave/save context -> migrate page/update page table-> restore wave?

Also when I check navi10 GPUVM programming guide, it clearly says that 
currently the Translate Further (F) option cannot be used in concert with the 
PDE as PTE (P) option. But this patch set F and P.

Sorry for too many questions.

Regards,
Oak

-----Original Message-----
From: amd-gfx <[email protected]> On Behalf Of Philip Yang
Sent: Friday, November 15, 2019 1:04 PM
To: Sierra Guiza, Alejandro (Alex) <[email protected]>; 
[email protected]
Subject: Re: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a 
retry-fault

One suggestion below, move the flags setting for else path into else path looks 
better.

Philip

On 2019-11-15 12:43 p.m., Alex Sierra wrote:
> After a retry-fault happens, the fault handler will modify the UTCL2 
> to set PTE bits to force a no-retry-fault. This will cause the wave to 
> enter the trap handler.
> 
> Change-Id: I177102891f715068f15605957ff23b0cab862603
> Signed-off-by: Alex Sierra <[email protected]>
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 13 +++++--------
>   1 file changed, 5 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3c0bd6472a46..e4d1a8fc97d5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -3167,7 +3167,8 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device *adev, 
> unsigned int pasid,
>                           uint64_t addr)
>   {
>       struct amdgpu_bo *root;
> -     uint64_t value, flags;
> +     uint64_t value = 0;
> +     uint64_t flags;
>       struct amdgpu_vm *vm;
>       long r;
>   
> @@ -3200,13 +3201,9 @@ bool amdgpu_vm_handle_fault(struct amdgpu_device 
> *adev, unsigned int pasid,
>               AMDGPU_PTE_SYSTEM;
>   
 >-     flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
 >-             AMDGPU_PTE_SYSTEM;
 >
>       if (amdgpu_vm_fault_stop == AMDGPU_VM_FAULT_STOP_NEVER) {
> -             /* Redirect the access to the dummy page */
> -             value = adev->dummy_page_addr;
> -             flags |= AMDGPU_PTE_EXECUTABLE | AMDGPU_PTE_READABLE |
> -                     AMDGPU_PTE_WRITEABLE;
 > +            /* Setting PTE flags to trigger a no-retry-fault  */
 > +            flags = AMDGPU_PTE_EXECUTABLE | AMDGPU_PDE_PTE |
 > +                    AMDGPU_PTE_TF;
        } else {
> -             value = 0;
 >+             flags = AMDGPU_PTE_VALID | AMDGPU_PTE_SNOOPED |
 >+                     AMDGPU_PTE_SYSTEM;
 >      }

>   
>       r = amdgpu_vm_bo_update_mapping(adev, vm, true, NULL, addr, addr + 
> 1,
> 
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

RE: [PATCH] amd/amdgpu: force to trigger a no-retry-fault after a retry-fault

Reply via email to