Am 22.08.2017 um 23:34 schrieb Jay Cornwall:
On Tue, Aug 22, 2017, at 16:17, Felix Kuehling wrote:
Thanks Alex!

Jay, do you think this is enough? This bumps the number of concurrent
operations on KIQ to 4 by default.
I'm not sure what the best number is. Up to 8 KFD processes is common
(beyond that performance drops off due to VMID availability) but I'm not
sure how often they would need to submit to KIQ concurrently. If it's
not expensive I'd just bump it up to say 16.

Well we allocate an array of pointers as ring buffer for the fences.

So I would say lets set this to 256, cause 256*number_of_entries_per_hw submision*number_of_bytes_for_a_pointer=4096.

This way we use up exactly one page for the fence array.

Regards,
Christian.


The performance problem isn't that bad since all the KIQ requests are
serialized but the dmesg spam is not nice. Perhaps lowering the severity
of the 'rcu slot is busy' message would address that as well?

Regards,
   Felix


On 2017-08-22 04:49 PM, Alex Deucher wrote:
KIQ doesn't really use the GPU scheduler.  The base
drivers generally use the KIQ ring directly rather than
submitting IBs.  However, amdgpu_sched_hw_submission
(which defaults to 2) limits the number of outstanding
fences to 2.  KFD uses the KIQ for TLB flushes and the
2 fence limit hurts performance when there are several KFD
processes running.

Signed-off-by: Alex Deucher <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 14 ++++++++++++--
  1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 6c5646b..f39b851 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -170,6 +170,16 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
                     unsigned irq_type)
  {
        int r;
+       int sched_hw_submission = amdgpu_sched_hw_submission;
+
+       /* Set the hw submission limit higher for KIQ because
+        * it's used for a number of gfx/compute tasks by both
+        * KFD and KGD which may have outstanding fences and
+        * it doesn't really use the gpu scheduler anyway;
+        * KIQ tasks get submitted directly to the ring.
+        */
+       if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)
+               sched_hw_submission *= 2;
if (ring->adev == NULL) {
                if (adev->num_rings >= AMDGPU_MAX_RINGS)
@@ -179,7 +189,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
                ring->idx = adev->num_rings++;
                adev->rings[ring->idx] = ring;
                r = amdgpu_fence_driver_init_ring(ring,
-                       amdgpu_sched_hw_submission);
+                                                 sched_hw_submission);
                if (r)
                        return r;
        }
@@ -219,7 +229,7 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
        }
ring->ring_size = roundup_pow_of_two(max_dw * 4 *
-                                            amdgpu_sched_hw_submission);
+                                            sched_hw_submission);
ring->buf_mask = (ring->ring_size / 4) - 1;
        ring->ptr_mask = ring->funcs->support_64bit_ptrs ?
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx
_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to