First of all patches #10 and #11 look like bug fixes to existing code to
me. So we should fix those problems before working on anything else.
Patch #10 is Reviewed-by: Christian König <[email protected]>
Patch #11:
list_for_each_entry(s_job, &sched->ring_mirror_list, node) {
struct amd_sched_fence *s_fence = s_job->s_fence;
- struct fence *fence = sched->ops->run_job(s_job);
+ struct fence *fence;
+ spin_unlock(&sched->job_list_lock);
+ fence = sched->ops->run_job(s_job);
atomic_inc(&sched->hw_rq_count);
if (fence) {
s_fence->parent = fence_get(fence);
@@ -451,6 +453,7 @@ void amd_sched_job_recovery(struct
amd_gpu_scheduler *sched)
DRM_ERROR("Failed to run job!\n");
amd_sched_process_job(NULL, &s_fence->cb);
}
+ spin_lock(&sched->job_list_lock);
}
spin_unlock(&sched->job_list_lock);
The problem is that the job might complete while we dropped the lock.
Please use list_for_each_entry_safe here and add a comment why the list
could be modified in the meantime.
With that fixed the patch is Reviewed-by: Christian König
<[email protected]> as well.
The remaining set looks very good to me as well, but I was rather
thinking of a more general approach instead of making it VM PD/PT specific.
For example we also need to backup/restore shaders when a hard GPU reset
happens.
So I would suggest the following:
1. We add an optional "shadow" flag so that when a BO in VRAM is
allocated we also allocate a shadow BO in GART.
2. We have another "backup" flag that says on the next command
submission the BO is backed up from VRAM to GART before that submission.
3. We set the shadow flag for VM PD/PT BOs and every time we modify them
set the backup flag so they get backed up on next CS.
4. We add an IOCTL to allow setting the backup flag from userspace so
that we can trigger another backup even after the first CS.
What do you think?
Regards,
Christian.
Am 25.07.2016 um 09:22 schrieb Chunming Zhou:
Since we cannot make sure VRAM is safe after gpu reset, page table backup
is neccessary, shadow page table is sense way to recovery page talbe when
gpu reset happens.
We need to allocate GTT bo as the shadow of VRAM bo when creating page table,
and make them same. After gpu reset, we will need to use SDMA to copy GTT bo
content to VRAM bo, then page table will be recoveried.
Chunming Zhou (13):
drm/amdgpu: add pd/pt bo shadow
drm/amdgpu: update shadow pt bo while update pt
drm/amdgpu: update pd shadow while updating pd
drm/amdgpu: implement amdgpu_vm_recover_page_table_from_shadow
drm/amdgpu: link all vm clients
drm/amdgpu: add vm_list_lock
drm/amd: add block entity function
drm/amdgpu: recover page tables after gpu reset
drm/amdgpu: add vm recover pt fence
drm/amd: reset hw count when reset job
drm/amd: fix deadlock of job_list_lock
drm/amd: wait neccessary dependency before running job
drm/amdgpu: fix sched deadoff
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 17 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 30 ++++-
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 5 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 5 +
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 161 ++++++++++++++++++++++++--
drivers/gpu/drm/amd/scheduler/gpu_scheduler.c | 35 +++++-
drivers/gpu/drm/amd/scheduler/gpu_scheduler.h | 3 +
8 files changed, 250 insertions(+), 18 deletions(-)
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx