Hi Evan,
that one is perfect if you ask me. Just reading up on the history of
that patch, Alex what was your concern with that?
Regarding printing this as error, that's a really good point as well. We
should probably reduce it to a warning or even info severity.
Regards,
Christian.
Am 20.03.2018 um 03:11 schrieb Quan, Evan:
Hi Christian,
The messages prompted on timeout are Errors not just Warnings although we did
not see any real problem(for the dgemm special case). That's why we say it
confusing.
And i suppose you want a fix like my previous patch(see attachment).
Regards,
Evan
-----Original Message-----
From: Christian König [mailto:[email protected]]
Sent: Monday, March 19, 2018 5:42 PM
To: Quan, Evan <[email protected]>; [email protected]
Cc: Deucher, Alexander <[email protected]>
Subject: Re: [PATCH] drm/amdgpu: disable job timeout on GPU reset
disabled
Am 19.03.2018 um 07:08 schrieb Evan Quan:
Since under some heavy computing environment(dgemm test), it takes the
asic over 10+ seconds to finish the dispatched single job which will
trigger the timeout. It's quite confusing although it does not seem to
bring any real problems.
As a quick workround, we choose to disable timeout when GPU reset is
disabled.
NAK, I enabled those warning intentionally even when the GPU recovery is
disabled to have a hint in the logs what goes wrong.
Please only increase the timeout for the compute queue and/or add a
separate timeout for them.
Regards,
Christian.
Change-Id: I3a95d856ba4993094dc7b6269649e470c5b053d2
Signed-off-by: Evan Quan <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 8bd9c3f..9d6a775 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -861,6 +861,13 @@ static void
amdgpu_device_check_arguments(struct amdgpu_device *adev)
amdgpu_lockup_timeout = 10000;
}
+ /*
+ * Disable timeout when GPU reset is disabled to avoid confusing
+ * timeout messages in the kernel log.
+ */
+ if (amdgpu_gpu_recovery == 0 || amdgpu_gpu_recovery == -1)
+ amdgpu_lockup_timeout = INT_MAX;
+
adev->firmware.load_type = amdgpu_ucode_get_load_type(adev,
amdgpu_fw_load_type);
}
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx