Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot

S, Shirish Fri, 13 Apr 2018 01:32:22 -0700


On 4/13/2018 12:38 PM, Christian König wrote:

Am 13.04.2018 um 09:01 schrieb S, Shirish:
On 4/13/2018 11:53 AM, Christian König wrote:
Am 13.04.2018 um 06:07 schrieb Shirish S:
amdgpu_ib_ring_tests() runs test IB's on rings at boot
contributes to ~500 ms of amdgpu driver's boot time.

This patch defers it and adds a check to report
in amdgpu_info_ioctl() if it was scheduled or not.
That is rather suboptimal, but see below.
Which part is sub-optimal, deferring or checking if the work isscheduled?
That was about the check. We should wait for the test to finishinstead of printing an error and continuing.

Done. Have made this change in V2.

Signed-off-by: Shirish S <shiris...@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h        |  2 ++
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 20 +++++++++++++++++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c    |  3 +++
  3 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.hb/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 5734871..ae8f722 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -1611,6 +1611,8 @@ struct amdgpu_device {
/* delayed work_func for deferring clockgating duringresume */
      struct delayed_work     late_init_work;
+ /* delayed work_func to defer testing IB's on rings duringboot */
+    struct delayed_work     late_init_test_ib_work;
You must put the IB test into the late_init_work as well, otherwisethe two delayed workers can race with each other.
I thought from the comment above the declaration, its clear why i amcreating 2 work structures.late_init_work is to optimize resume time and late_init_test_ib_workis to optimize the boot time.There cant be race as the context in which they are called aretotally different.
Late init enables power and clock gating. If I'm not completelymistaken we don't do the power/clock gating earlier because we had towait for the IB test to finish.
Could be that modern ASICs have additional logic to prevent that, butthe last time I worked on this power gating a block while you runsomething on it could even crash the whole system.
        struct amdgpu_virt    virt;
      /* firmware VRAM reservation */
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.cb/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 1762eb4..e65a5e6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -63,6 +63,7 @@ MODULE_FIRMWARE("amdgpu/vega12_gpu_info.bin");
  MODULE_FIRMWARE("amdgpu/raven_gpu_info.bin");
    #define AMDGPU_RESUME_MS        2000
+#define AMDGPU_IB_TEST_SCHED_MS        2000
    static const char *amdgpu_asic_name[] = {
      "TAHITI",
@@ -2105,6 +2106,16 @@ bool amdgpu_device_asic_has_dc_support(enumamd_asic_type asic_type)
      }
  }
+static void amdgpu_device_late_init_test_ib_func_handler(structwork_struct *work)
+{
+    struct amdgpu_device *adev =
+ container_of(work, struct amdgpu_device,late_init_test_ib_work.work);
+    int r = amdgpu_ib_ring_tests(adev);
+
+    if (r)
+        DRM_ERROR("ib ring test failed (%d).\n", r);
+}
+
  /**
   * amdgpu_device_has_dc_support - check if dc is supported
   *
@@ -2212,6 +2223,8 @@ int amdgpu_device_init(struct amdgpu_device*adev,
      INIT_LIST_HEAD(&adev->ring_lru_list);
      spin_lock_init(&adev->ring_lru_list_lock);
  + INIT_DELAYED_WORK(&adev->late_init_test_ib_work,
+ amdgpu_device_late_init_test_ib_func_handler);
      INIT_DELAYED_WORK(&adev->late_init_work,
                amdgpu_device_ip_late_init_func_handler);
@@ -2374,9 +2387,9 @@ int amdgpu_device_init(struct amdgpu_device*adev,
          goto failed;
      }
  -    r = amdgpu_ib_ring_tests(adev);
-    if (r)
-        DRM_ERROR("ib ring test failed (%d).\n", r);
+    /* Schedule amdgpu_ib_ring_tests() */
+    mod_delayed_work(system_wq, &adev->late_init_test_ib_work,
+            msecs_to_jiffies(AMDGPU_IB_TEST_SCHED_MS));
That doesn't work like you intended. mod_delayed_work() overridesthe existing handler.
What you wanted to use is queue_delayed_work(), but as I said weshould only have one delayed worker.
mod_delayed_work() is safer and optimal method that replacescancel_delayed_work() followed by queue_delayed_work().
(https://lkml.org/lkml/2011/2/3/175)
But if you strongly insist i don't mind changing it.
Well, mod_delayed_work() does NOT replace queue_delayed_work(). Thosetwo functions are for different use cases.
The link you posted actually explains it quite well:
So, cancel_delayed_work() followed by queue_delayed_work() schedules
the work to be executed at the specified time regardless of the
current pending state while queue_delayed_work() takes effect iff
currently the work item is not pending.
queue_delayed_work() takes only effect if the work item is not alreadypending/executing.
In other words with queue_delayed_work() you don't risk executingthings twice when the work is already executing.
While with mod_delayed_work() you absolutely make sure to executethings twice when it is already running.
If I'm not completely mistaken it should not matter in this case, butqueue_delayed_work() is used more commonly I think.

Since this work is run only once that too at the boot, we wont have acase where in it will be required to

check if its pending or not.

Anyway, i have changed mod_delayed_work() to queue_delayed_work() andadded flush_delayed_work() in amdgpu_info_ioctl() in V2.

Thanks.
Regards,
Shirish S

        if (amdgpu_sriov_vf(adev))
          amdgpu_virt_init_data_exchange(adev);

@@ -2469,6 +2482,7 @@ void amdgpu_device_fini(struct amdgpu_device*adev)

      }
      adev->accel_working = false;
cancel_delayed_work_sync(&adev->late_init_work);
+ cancel_delayed_work_sync(&adev->late_init_test_ib_work);
      /* free i2c buses */
      if (!amdgpu_device_has_dc_support(adev))
          amdgpu_i2c_fini(adev);

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.cb/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

index 487d39e..057bd9a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c

@@ -279,6 +279,9 @@ static int amdgpu_info_ioctl(struct drm_device*dev, void *data, struct drm_file

      if (!info->return_size || !info->return_pointer)
          return -EINVAL;
  +    if (delayed_work_pending(&adev->late_init_test_ib_work))
+        DRM_ERROR("IB test on ring not executed\n");
+


Please use flush_delayed_work() instead of issuing and error here.

Agree, wasn't sure of what to do here :).

So i will re-spin with the flush part added. Hope this replyclarifies your comments.

Thanks.
Regards,
Shirish S

Regards,
Christian.

      switch (info->query) {
      case AMDGPU_INFO_ACCEL_WORKING:
          ui32 = adev->accel_working;


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: defer test IBs on the rings at boot

Reply via email to