On 07/09/2018 04:55 PM, Christian König wrote:
Am 09.07.2018 um 09:48 schrieb Zhang, Jerry (Junwei):
On 07/09/2018 03:04 PM, Christian König wrote:
Am 09.07.2018 um 07:13 schrieb Zhang, Jerry (Junwei):
On 07/06/2018 03:27 AM, Andrey Grodzovsky wrote:
Extract and present the reposnsible process and thread when
VM_FAULT happens.

v2: Use getter and setter functions.

Signed-off-by: Andrey Grodzovsky <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c |  4 ++++
  drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c  | 10 +++++++---
  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c  |  9 +++++++--
  3 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 7a625f3..609c8f5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -187,6 +187,10 @@ static int amdgpu_cs_parser_init(struct amdgpu_cs_parser 
*p, void *data)
      if (p->uf_entry.robj)
          p->job->uf_addr = uf_offset;
      kfree(chunk_array);
+
+    /* Use this opportunity to fill in task info for the vm */
+    amdgpu_vm_set_task_info(vm);
+

Shall we set the task info when vm init?

No, vm_init() is called from a completely different process which is later on 
user of the VM.

Originally I thought UMD opened DRI node and create a VM by vm_init(), then 
every command submission
would be passed in the same VM initialized by vm_init().

So that's different process?

The display server, e.g. X or Wayland.

See with DRI3 the process of opening a connection to the hardware is that the 
display server open the file descriptor and with that calls vm_init.

And then this file descriptor is passed to the client processes through IPC.

Thanks to reply.
yes, it's likely to open amdgpu node in ddx when driver probe and pass it to 
other client.

While that looks like just in the process of initialization that X server loads 
ddx driver.
But when I start 2 glxgears, that kms_open()->vm_init() will be called twice, 
which looks related to App as well.
(anyway, I will check it more)

Even so, it sounds vm_init() should be created firstly, then we use that VM for 
process on every command submission.
So I thought to set the task info at the first time vm_init() and use that info 
on VM fault process func.

Jerry


Regards,
Christian.






      return 0;

  free_all_kdata:
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
index 08753e7..75f3ffb 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v8_0.c
@@ -46,7 +46,6 @@

  #include "ivsrcid/ivsrcid_vislands30.h"

-
  static void gmc_v8_0_set_gmc_funcs(struct amdgpu_device *adev);
  static void gmc_v8_0_set_irq_funcs(struct amdgpu_device *adev);
  static int gmc_v8_0_wait_for_idle(void *handle);
@@ -1449,8 +1448,13 @@ static int gmc_v8_0_process_interrupt(struct 
amdgpu_device *adev,
          gmc_v8_0_set_fault_enable_default(adev, false);

      if (printk_ratelimit()) {
-        dev_err(adev->dev, "GPU fault detected: %d 0x%08x\n",
-            entry->src_id, entry->src_data[0]);
+        struct amdgpu_task_info task_info = { 0 };
+
+        amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);

Shall we find vm and get the vm->task_info directly instead of filling local 
variable task_info?
(current style is also OK for me, just for more info)

No, we can't guarantee that the task_info pointer in the VM won't be freed 
after dropping the spinlock. So returning a copy here is the better approach.


And we may also check the return value for "NULL" case, otherwise it may cause 
access errorin dev_err(),
if failed to find vm (although, it's most unlikely to happen).

Since we use a copy of the task info we should never get a NULL pointer. The string 
should already be zero terminated with the "{ 0 }" initialization above.

Thanks to explain that more.
Got it, fine for me.

Jerry


Christian.


Jerry

+
+        dev_err(adev->dev, "GPU fault detected: %d 0x%08x for process %s pid %d 
thread %s pid %d\n",
+            entry->src_id, entry->src_data[0], task_info.process_name,
+            task_info.tgid, task_info.task_name, task_info.pid);
          dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_ADDR 0x%08X\n",
              addr);
          dev_err(adev->dev, " VM_CONTEXT1_PROTECTION_FAULT_STATUS 0x%08X\n",
diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c 
b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
index 691a659..9df94b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
@@ -259,11 +259,16 @@ static int gmc_v9_0_process_interrupt(struct 
amdgpu_device *adev,
      }

      if (printk_ratelimit()) {
+        struct amdgpu_task_info task_info = { 0 };
+
+        amdgpu_vm_get_task_info(adev, entry->pasid, &task_info);
+
          dev_err(adev->dev,
-            "[%s] VMC page fault (src_id:%u ring:%u vmid:%u pasid:%u)\n",
+            "[%s] VMC page fault (src_id:%u ring:%u vmid:%u pasid:%u, for process 
%s pid %d thread %s pid %d\n)\n",
              entry->vmid_src ? "mmhub" : "gfxhub",
              entry->src_id, entry->ring_id, entry->vmid,
-            entry->pasid);
+            entry->pasid, task_info.process_name, task_info.tgid,
+            task_info.task_name, task_info.pid);
          dev_err(adev->dev, "  at page 0x%016llx from %d\n",
              addr, entry->client_id);
          if (!amdgpu_sriov_vf(adev))



_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to