Am 20.08.2018 um 05:35 schrieb Emily Deng:
From: Monk Liu <monk....@amd.com>

SWDEV-146499: hang during multi vulkan process testing

cause:
the second frame's PREAMBLE_IB have clear-state
and LOAD actions, those actions ruin the pipeline
that is still doing process in the previous frame's
work-load IB.

fix:
need insert pipeline sync if have context switch for
SRIOV (because only SRIOV will report PREEMPTION flag
to UMD)

Change-Id: If676da7a9b0147114cd76d19b6035ed8033de449
Signed-off-by: Monk Liu <monk....@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 15 +++++++++++----
  1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
index 5c22cfd..66efa85 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c
@@ -95,6 +95,12 @@ void amdgpu_ib_free(struct amdgpu_device *adev, struct 
amdgpu_ib *ib,
        amdgpu_sa_bo_free(adev, &ib->sa_bo, f);
  }
+static bool amdgpu_ib_has_preamble(struct amdgpu_ring *ring, bool ctx_switch) {
+       return (amdgpu_sriov_vf(ring->adev) &&
+                       ring->funcs->type == AMDGPU_RING_TYPE_GFX &&
+                       ctx_switch);
+}
+

Well NAK, please merge that into amdgpu_ib_schedule() and drop the extra check for GFX, that will apply to compute queues as well.

  /**
   * amdgpu_ib_schedule - schedule an IB (Indirect Buffer) on the ring
   *
@@ -123,7 +129,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
        struct amdgpu_device *adev = ring->adev;
        struct amdgpu_ib *ib = &ibs[0];
        struct dma_fence *tmp = NULL;
-       bool skip_preamble, need_ctx_switch;
+       bool need_ctx_switch;
        unsigned patch_offset = ~0;
        struct amdgpu_vm *vm;
        uint64_t fence_ctx;
@@ -156,6 +162,8 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
                return -EINVAL;
        }
+ need_ctx_switch = ring->current_ctx != fence_ctx;
+
        alloc_size = ring->funcs->emit_frame_size + num_ibs *
                ring->funcs->emit_ib_size;
@@ -167,6 +175,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs, if (ring->funcs->emit_pipeline_sync && job &&
            ((tmp = amdgpu_sync_get_fence(&job->sched_sync, NULL)) ||
+                amdgpu_ib_has_preamble(ring, need_ctx_switch) ||
             amdgpu_vm_need_pipeline_sync(ring, job))) {
                need_pipe_sync = true;
@@ -200,8 +209,6 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned num_ibs,
                        amdgpu_asic_flush_hdp(adev, ring);
        }
- skip_preamble = ring->current_ctx == fence_ctx;
-       need_ctx_switch = ring->current_ctx != fence_ctx;
        if (job && ring->funcs->emit_cntxcntl) {
                if (need_ctx_switch)
                        status |= AMDGPU_HAVE_CTX_SWITCH;
@@ -215,7 +222,7 @@ int amdgpu_ib_schedule(struct amdgpu_ring *ring, unsigned 
num_ibs,
/* drop preamble IBs if we don't have a context switch */
                if ((ib->flags & AMDGPU_IB_FLAG_PREAMBLE) &&
-                       skip_preamble &&
+                       !need_ctx_switch &&
                        !(status & AMDGPU_PREAMBLE_IB_PRESENT_FIRST) &&
                        !amdgpu_sriov_vf(adev)) /* for SRIOV preemption, 
Preamble CE ib must be inserted anyway */

Ok, please clean that up more carefully. All prerequisites which are not IB dependent should come before the loop.

BTW: The handling of AMDGPU_PREAMBLE_IB_PRESENT_FIRST in amdgpu_cs.c is completely broken as well.

Regards,
Christian.

                        continue;

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to