That commit causes the screen to freeze a few moments after running
clinfo on v6.6-rc7 and ROCm 5.6. Sometimes the rest of the computer
including ssh also freezes. On v6.5-rc1, it only results in a NULL pointer
deference message in dmesg and the process to become a zombie whose
unkillableness prevents shutdown without REISUB. Although llama.cpp and
hashcat were working in v6.2 and ROCm 5.6, broke, and are not fixed by
this revert, pytorch-rocm is now working with stability and without
whole-computer freezes caused by any accidental running of clinfo.

This reverts commit 1d7776cc148b9f2f3ebaf1181662ba695a29f639.

Closes: https://github.com/RadeonOpenCompute/ROCm/issues/2596
Signed-off-by: Daniel Tang <danielzgtg.opensou...@gmail.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 82f25996ff5e..602f311ab766 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2243,16 +2243,16 @@ int amdgpu_vm_make_compute(struct amdgpu_device *adev, 
struct amdgpu_vm *vm)
        if (r)
                return r;
 
+       /* Sanity checks */
+       if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
+               r = -EINVAL;
+               goto unreserve_bo;
+       }
+
        /* Check if PD needs to be reinitialized and do it before
         * changing any other state, in case it fails.
         */
        if (pte_support_ats != vm->pte_support_ats) {
-               /* Sanity checks */
-               if (!amdgpu_vm_pt_is_root_clean(adev, vm)) {
-                       r = -EINVAL;
-                       goto unreserve_bo;
-               }
-
                vm->pte_support_ats = pte_support_ats;
                r = amdgpu_vm_pt_clear(adev, vm, to_amdgpu_bo_vm(vm->root.bo),
                                       false);
--
2.40.1



Reply via email to