On 07/09/2018 04:55 PM, Christian König wrote:
Am 09.07.2018 um 09:48 schrieb Zhang, Jerry (Junwei):
On 07/09/2018 03:04 PM, Christian König wrote:
Am 09.07.2018 um 07:13 schrieb Zhang, Jerry (Junwei):
On 07/06/2018 03:27 AM, Andrey Grodzovsky wrote:
Extract and present the reposnsible process and thread when
VM_FAULT happens.
v2: Use getter and setter functions.
Signed-off-by: Andrey Grodzovsky <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ++++
drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c | 10 +++++++---
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 9 +++++++--
3 files changed, 18 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 7a625f3..609c8f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -187,6 +187,10 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser
*p, void *data)
if (p->uf_entry.robj)
p->job->uf_addr = uf_offset;
kfree(chunk_array);
+
+ /* Use this opportunity to fill in task info for the vm */
+ amdgpu_vm_set_task_info(vm);
+
Shall we set the task info when vm init?
No, vm_init() is called from a completely different process which is later on
user of the VM.
Originally I thought UMD opened DRI node and create a VM by vm_init(), then
every command submission
would be passed in the same VM initialized by vm_init().
So that's different process?
The display server, e.g. X or Wayland.
See with DRI3 the process of opening a connection to the hardware is that the
display server open the file descriptor and with that calls vm_init.
And then this file descriptor is passed to the client processes through IPC.
Thanks to reply.
yes, it's likely to open amdgpu node in ddx when driver probe and pass it to
other client.
While that looks like just in the process of initialization that X server loads
ddx driver.
But when I start 2 glxgears, that kms_open()->vm_init() will be called twice,
which looks related to App as well.
(anyway, I will check it more)
Even so, it sounds vm_init() should be created firstly, then we use that VM for
process on every command submission.
So I thought to set the task info at the first time vm_init() and use that info
on VM fault process func.
Jerry
Regards,
Christian.
return 0;
free_all_kdata:
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 08753e7..75f3ffb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -46,7 +46,6 @@
#include "ivsrcid/ivsrcid_vislands30.h"
-
static void gmc_v8_0_set_gmc_funcs(struct amdgpu_device *adev);
static void gmc_v8_0_set_irq_funcs(struct amdgpu_device *adev);
static int gmc_v8_0_wait_for_idle(void *handle);
@@ -1449,8 +1448,13 @@ static int gmc_v8_0_process_interrupt(struct
amdgpu_device *adev,
gmc_v8_0_set_fault_enable_default(adev, false);
if (printk_ratelimit()) {
- dev_err(adev->dev, "GPU fault detected: %d 0x%08x\n",
- entry->src_id, entry->src_data[0]);
+ struct amdgpu_task_info task_info = { 0 };
+
+ amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
Shall we find vm and get the vm->task_info directly instead of filling local
variable task_info?
(current style is also OK for me, just for more info)
No, we can't guarantee that the task_info pointer in the VM won't be freed
after dropping the spinlock. So returning a copy here is the better approach.
And we may also check the return value for "NULL" case, otherwise it may cause
access errorin dev_err(),
if failed to find vm (although, it's most unlikely to happen).
Since we use a copy of the task info we should never get a NULL pointer. The string
should already be zero terminated with the "{ 0 }" initialization above.
Thanks to explain that more.
Got it, fine for me.
Jerry
Christian.
Jerry
+
+ dev_err(adev->dev, "GPU fault detected: %d 0x%08x for process %s pid %d
thread %s pid %d\n",
+ entry->src_id, entry->src_data[0], task_info.process_name,
+ task_info.tgid, task_info.task_name, task_info.pid);
dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x%08X\n",
addr);
dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x%08X\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 691a659..9df94b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -259,11 +259,16 @@ static int gmc_v9_0_process_interrupt(struct
amdgpu_device *adev,
}
if (printk_ratelimit()) {
+ struct amdgpu_task_info task_info = { 0 };
+
+ amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
+
dev_err(adev->dev,
- "[%s] VMC page fault (src_id:%u ring:%u vmid:%u pasid:%u)\n",
+ "[%s] VMC page fault (src_id:%u ring:%u vmid:%u pasid:%u, for process
%s pid %d thread %s pid %d\n)\n",
entry->vmid_src ? "mmhub" : "gfxhub",
entry->src_id, entry->ring_id, entry->vmid,
- entry->pasid);
+ entry->pasid, task_info.process_name, task_info.tgid,
+ task_info.task_name, task_info.pid);
dev_err(adev->dev, " at page 0x%016llx from %d\n",
addr, entry->client_id);
if (!amdgpu_sriov_vf(adev))
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx