Am 09.04.23 um 20:59 schrieb Bas Nieuwenhuizen:
Instead of failing somewhere in the scheduler after the
ioctl has already succeeded.

Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2498
Signed-off-by: Bas Nieuwenhuizen <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c  | 9 +++++++++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 5 +++++
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
  3 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 995ee9ff65c9..8db6618b9049 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -113,6 +113,15 @@ int amdgpu_job_alloc(struct amdgpu_device *adev, struct 
amdgpu_vm *vm,
        if (!entity)
                return 0;
+ if (entity->rq && entity->rq->sched) {

I've just double checked and this stuff here is not necessary initialized yet. We need to move this a bit around.

Probably best place for the check is in amdgpu_cs_submit() after calling drm_sched_job_arm().

Alternatively we could go the other way around. Instead of keeping the max_ibs in the ring we keep a max_ibs per ip_type in adev and make sure that each ring can handle at least those during initialization.

Then we can check if the num_ibs are valid in amdgpu_cs_p1_ib() when we count them.

Thinking more about it the later is probably the better variant.

Regards,
Christian.

+               struct amdgpu_ring *ring = to_amdgpu_ring(entity->rq->sched);
+
+               if (num_ibs > ring->max_ibs) {
+                       DRM_DEBUG("Rejected a submission with too many IBs");
+                       return -EINVAL;
+               }
+       }
+
        return drm_sched_job_init(&(*job)->base, entity, owner);
  }
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index dc474b809604..933cb95a0e30 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -324,6 +324,11 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
        ring->max_dw = max_dw;
        ring->hw_prio = hw_prio;
+ if (ring->funcs->emit_ib_size) {
+               ring->max_ibs =
+                       (max_dw - ring->funcs->emit_frame_size) / 
ring->funcs->emit_ib_size;
+       }
+
        if (!ring->no_scheduler) {
                hw_ip = ring->funcs->type;
                num_sched = &adev->gpu_sched[hw_ip][hw_prio].num_scheds;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
index 3989e755a5b4..7a295d80728b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
@@ -245,6 +245,7 @@ struct amdgpu_ring {
        unsigned                ring_size;
        unsigned                max_dw;
        int                     count_dw;
+       unsigned                max_ibs;
        uint64_t                gpu_addr;
        uint64_t                ptr_mask;
        uint32_t                buf_mask;

Reply via email to