Patch #1:

Could be that we need to add another module parameter to control this, but I think for now that should be sufficient.

Patch is Reviewed-by: Christian König <christian.koe...@amd.com>

Patch #2:

+       if (direct_submit) {
+               r = amdgpu_ib_schedule(ring, job->num_ibs, job->ibs,
+                                      NULL, NULL, fence);
+               job->fence = fence_get(*fence);
+               if (r)
+                       DRM_ERROR("Error scheduling IBs (%d)\n", r);
+               amdgpu_job_free(job);

When there is an error you should return the code as well.

+       } else {
+               r = amdgpu_job_submit(job, ring, &adev->mman.entity,
+                                     AMDGPU_FENCE_OWNER_UNDEFINED, fence);
+               if (r)
+                       goto error_free;
+       }
return 0;

Just changing this to to "return r;" should be sufficient.

With that fixed the patch is Reviewed-by: Christian König <christian.koe...@amd.com> as well.

Patch #3:

You mentioned that this isn't used during command submission but rather during GPU reset, correct?

In this case I would advise not to use a member in the BO structure to note the backup direction and instead have one function for back and one for restoring the content (or note that as a parameter to the function).

Otherwise we could run into trouble when the CS wants to backup something the GPU reset wants to restore at the same time.

Patch #4: Was already reviewed by me.

Patch #5:

+       pt = params->shadow ? vm->page_tables[pt_idx].entry.robj->shadow :
+               vm->page_tables[pt_idx].entry.robj;
You need to double check here if shadows are actually allocated or not. Otherwise we will crash on an APU.

+       /* double ndw, since need to update shadow pt bo as well */
+       ndw *= 2;
+
We don't need to double the IB size, but only the number of commands in it (ncmds).

The difference is when we want to write scattered GART entries. In this case I've added the PTEs to the end of the IB.

Patch #6:
+       spinlock_t                      shadow_list_lock;
We might want to use a mutex here instead. Usually I would agree that a spin lock is better for a linked list, but during the restore process in a GPU reset we probably want to sleep here.

Alternatively you could splice the list to a local version on the stack, grab references to the BOs and then drop the lock during the restore process.

@@ -541,6 +546,13 @@ void amdgpu_bo_unref(struct amdgpu_bo **bo)
        if ((*bo) == NULL)
                return;
+ /* shadow must be freed before bo itself */
+       if ((!(*bo)->shadow) && !list_empty(&(*bo)->shadow_list)) {
+               spin_lock(&(*bo)->adev->shadow_list_lock);
+               list_del_init(&(*bo)->shadow_list);
+               spin_unlock(&(*bo)->adev->shadow_list_lock);
+
+       }
        tbo = &((*bo)->tbo);
        ttm_bo_unref(&tbo);
        if (tbo == NULL)
That would trigger even when we take a temporary reference. I suggest to add a amdgpu_bo_unref_shadow() function instead, unreferencing both the shadow and the normal BO.

This can then be used in the VM code to cleanup the shadow created.

Going to skip patch #7 and #8 for now cause our team call begins in a moment.

Regards,
Christian.

Am 17.08.2016 um 08:05 schrieb Chunming Zhou:
Since we cannot ensure VRAM is consistent after a GPU reset, page
table shadowing is necessary. Shadowed page tables are, in a sense, a
method to recover the consistent state of the page tables before the
reset occurred.

We need to allocate GTT bo as the shadow of VRAM bo when creating page table,
and make them the same. After gpu reset, we will need to use SDMA to copy GTT bo
content to VRAM bo, then page table will be recoveried.


V2:
Shadow bo uses a shadow entity running on normal run queue, after gpu reset,
we need to wait for all shadow jobs finished first, then recovery page table 
from shadow.

V3:
Addressed Christian comments for shadow bo part.

V4:
Switch back to update page table twice (one of two is for shadow)

V5:
make it be gerneic shadow bo support. Address Christian comments.

Chunming Zhou (8):
   drm/amdgpu: add need backup function V2
   drm/amdgpu: add direct submision option for copy_buffer
   drm/amdgpu: sync bo and shadow V2
   drm/amdgpu: update pd shadow while updating pd
   drm/amdgpu: update pt shadow while updating pt V2
   drm/amdgpu: link all shadow bo
   drm/amdgpu: implement recovery vram bo from shadow
   drm/amdgpu: recover vram bo from shadow after gpu reset

  drivers/gpu/drm/amd/amdgpu/amdgpu.h           |  9 ++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_benchmark.c |  3 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    | 39 ++++++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c    | 94 ++++++++++++++++++++++++++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  9 +++
  drivers/gpu/drm/amd/amdgpu/amdgpu_test.c      |  4 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 21 ++++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c        | 69 ++++++++++++++------
  8 files changed, 215 insertions(+), 33 deletions(-)


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to