Hi Christian Please see inline comments
-----Original Message----- From: Koenig, Christian <christian.koe...@amd.com> Sent: 2020年4月22日 15:54 To: Tao, Yintian <yintian....@amd.com>; Liu, Monk <monk....@amd.com>; Liu, Shaoyun <shaoyun....@amd.com>; Kuehling, Felix <felix.kuehl...@amd.com> Cc: amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: refine kiq access register Am 22.04.20 um 09:49 schrieb Tao, Yintian: > Hi Christian > > > Please see inline comments. > -----Original Message----- > From: Christian König <ckoenig.leichtzumer...@gmail.com> > Sent: 2020年4月22日 15:40 > To: Tao, Yintian <yintian....@amd.com>; Koenig, Christian > <christian.koe...@amd.com>; Liu, Monk <monk....@amd.com>; Liu, Shaoyun > <shaoyun....@amd.com>; Kuehling, Felix <felix.kuehl...@amd.com> > Cc: amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdgpu: refine kiq access register > > Am 22.04.20 um 09:35 schrieb Tao, Yintian: >> Hi Christian >> >> >>> BUG_ON(in_interrupt()); >> That won't work like this. The KIQ is also used in interrupt context in the >> driver, that's why we used spin_lock_irqsave(). >> [yttao]: According to the current drm-next code, I have not find where to >> access register through KIQ. >> And you need to wait for the free kiq ring buffer space if >> there is no free kiq ring buffer, here, you wait at interrupt context is >> illegal. > Waiting in atomic context is illegal as well, but we don't have much other > choice. > [yttao]: no, there is no sleep in atomic context at my patch. I'm not talking about a sleeping, but busy waiting. > We just need to make sure that waiting never happens by making the buffers > large enough and if it still happens print and error. > [yttao]: this is not the good choice because KMD need to protect it instead > of hoping user not frequently invoke KIQ acess. The only other choice we have is busy waiting, e.g. loop until we get a free slot. [yttao]: Yes, now may patch use msleep() to busy waiting. Or you means need to use udelay()? If we use udelay(), it will be the nightmare under multi-VF. Because it is assumed that there are 16VF within world-switch 6ms, the bad situation is that one VF will udelay(16*6ms = 96ms) to get one free slot. Regards, Christian. > >> And I would either say that we should use the trick with the NOP to reserve >> space on the ring buffer or call amdgpu_device_wb_get() for each read. >> amdgpu_device_wb_get() also uses find_first_zero_bit() and should work >> equally well. >> [yttao]: sorry, can you give me more details about how to use NOP to reserve >> space? I will use amdgpu_device_wb_get() for the read operation. > We could use the NOP PM4 command as Felix suggested, this command has > a > header+length and says that the next X dw should be ignore on the ring > buffer. > > But I think using amdgpu_device_wb_get() is better anyway. > [yttao]: yes, I agreed with amdgpu_device_wb_get() method because it > will fix prevent potential read race condition but NOP method will not > prevent it > > Regards, > Christian. > >> >> >> -----Original Message----- >> From: Koenig, Christian <christian.koe...@amd.com> >> Sent: 2020年4月22日 15:23 >> To: Tao, Yintian <yintian....@amd.com>; Liu, Monk <monk....@amd.com>; >> Liu, Shaoyun <shaoyun....@amd.com>; Kuehling, Felix >> <felix.kuehl...@amd.com> >> Cc: amd-gfx@lists.freedesktop.org >> Subject: Re: [PATCH] drm/amdgpu: refine kiq access register >> >>> BUG_ON(in_interrupt()); >> That won't work like this. The KIQ is also used in interrupt context in the >> driver, that's why we used spin_lock_irqsave(). >> >> And I would either say that we should use the trick with the NOP to reserve >> space on the ring buffer or call amdgpu_device_wb_get() for each read. >> amdgpu_device_wb_get() also uses find_first_zero_bit() and should work >> equally well. >> >> You also don't need to worry to much about overflowing the wb area. >> Since we run in an atomic context we can have at most the number of CPU in >> the system + interrupt context here. >> >> Regards, >> Christian. >> >> Am 22.04.20 um 09:11 schrieb Tao, Yintian: >>> Add Felix and Shaoyun >>> >>> -----Original Message----- >>> From: Yintian Tao <yt...@amd.com> >>> Sent: 2020年4月22日 12:42 >>> To: Koenig, Christian <christian.koe...@amd.com>; Liu, Monk >>> <monk....@amd.com> >>> Cc: amd-gfx@lists.freedesktop.org; Tao, Yintian >>> <yintian....@amd.com> >>> Subject: [PATCH] drm/amdgpu: refine kiq access register >>> >>> According to the current kiq access register method, there will be race >>> condition when using KIQ to read register if multiple clients want to read >>> at same time just like the expample below: >>> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the seqno-0 3. >>> client-B start to read REG-1 through KIQ 4. client-B poll the seqno-1 5. >>> the kiq complete these two read operation 6. client-A to read the register >>> at the wb buffer and >>> get REG-1 value >>> >>> And if there are multiple clients to frequently write registers through KIQ >>> which may raise the KIQ ring buffer overwritten problem. >>> >>> Therefore, allocate fixed number wb slot for rreg use and limit the submit >>> number which depends on the kiq ring_size in order to prevent the >>> overwritten problem. >>> >>> Signed-off-by: Yintian Tao <yt...@amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 7 +- >>> .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c | 12 +- >>> .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 12 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 129 ++++++++++++++++-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 6 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 6 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 13 +- >>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 +- >>> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 8 +- >>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 34 +++-- >>> drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 12 +- >>> drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 +- >>> 12 files changed, 211 insertions(+), 48 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> index 4e1d4cfe7a9f..4530e0de4257 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h >>> @@ -526,7 +526,7 @@ static inline void amdgpu_set_ib_value(struct >>> amdgpu_cs_parser *p, >>> /* >>> * Writeback >>> */ >>> -#define AMDGPU_MAX_WB 128 /* Reserve at most 128 WB slots for >>> amdgpu-owned rings. */ >>> +#define AMDGPU_MAX_WB 256 /* Reserve at most 256 WB slots for >>> amdgpu-owned rings. */ >>> >>> struct amdgpu_wb { >>> struct amdgpu_bo *wb_obj; >>> @@ -1028,6 +1028,11 @@ bool amdgpu_device_has_dc_support(struct >>> amdgpu_device *adev); >>> >>> int emu_soc_asic_init(struct amdgpu_device *adev); >>> >>> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read); void >>> +amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq); >>> + >>> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t >>> +*offs); void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, >>> +uint32_t *offs); >>> /* >>> * Registers read & write functions. >>> */ >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c >>> index 691c89705bcd..034c9f416499 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c >>> @@ -309,6 +309,7 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void >>> *mqd, >>> uint32_t doorbell_off) >>> { >>> struct amdgpu_device *adev = get_amdgpu_device(kgd); >>> + struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring; >>> struct v10_compute_mqd *m; >>> uint32_t mec, pipe; >>> @@ -324,13 +325,19 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void >>> *mqd, >>> pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n", >>> mec, pipe, queue_id); >>> >>> - spin_lock(&adev->gfx.kiq.ring_lock); >>> + r = amdgpu_gfx_kiq_lock(kiq, false); >>> + if (r) { >>> + pr_err("failed to lock kiq\n"); >>> + goto out_unlock; >>> + } >>> + >>> r = amdgpu_ring_alloc(kiq_ring, 7); >>> if (r) { >>> pr_err("Failed to alloc KIQ (%d).\n", r); >>> goto out_unlock; >>> } >>> >>> + amdgpu_gfx_kiq_consume(kiq, NULL); >>> amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5)); >>> amdgpu_ring_write(kiq_ring, >>> PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* >>> Queue_Sel */ @@ -350,8 +357,9 @@ static int kgd_hiq_mqd_load(struct kgd_dev >>> *kgd, void *mqd, >>> amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi); >>> amdgpu_ring_commit(kiq_ring); >>> >>> + amdgpu_gfx_kiq_restore(kiq, NULL); >>> out_unlock: >>> - spin_unlock(&adev->gfx.kiq.ring_lock); >>> + amdgpu_gfx_kiq_unlock(&adev->gfx.kiq); >>> release_queue(kgd); >>> >>> return r; >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c >>> index df841c2ac5e7..f243d9990ced 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c >>> @@ -307,6 +307,7 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void >>> *mqd, >>> uint32_t doorbell_off) >>> { >>> struct amdgpu_device *adev = get_amdgpu_device(kgd); >>> + struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring; >>> struct v9_mqd *m; >>> uint32_t mec, pipe; >>> @@ -322,13 +323,19 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void >>> *mqd, >>> pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n", >>> mec, pipe, queue_id); >>> >>> - spin_lock(&adev->gfx.kiq.ring_lock); >>> + r = amdgpu_gfx_kiq_lock(kiq, false); >>> + if (r) { >>> + pr_err("failed to lock kiq\n"); >>> + goto out_unlock; >>> + } >>> + >>> r = amdgpu_ring_alloc(kiq_ring, 7); >>> if (r) { >>> pr_err("Failed to alloc KIQ (%d).\n", r); >>> goto out_unlock; >>> } >>> >>> + amdgpu_gfx_kiq_consume(kiq, NULL); >>> amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5)); >>> amdgpu_ring_write(kiq_ring, >>> PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* >>> Queue_Sel */ @@ -348,8 +355,9 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev >>> *kgd, void *mqd, >>> amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi); >>> amdgpu_ring_commit(kiq_ring); >>> >>> + amdgpu_gfx_kiq_restore(kiq, NULL); >>> out_unlock: >>> - spin_unlock(&adev->gfx.kiq.ring_lock); >>> + amdgpu_gfx_kiq_unlock(&adev->gfx.kiq); >>> release_queue(kgd); >>> >>> return r; >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>> index ea576b4260a4..c6b1c77b5eac 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>> @@ -295,23 +295,99 @@ static int amdgpu_gfx_kiq_acquire(struct >>> amdgpu_device *adev, >>> return -EINVAL; >>> } >>> >>> +void amdgpu_gfx_kiq_reg_offs_get(struct amdgpu_kiq *kiq, >>> + uint32_t *offset) >>> +{ >>> + uint32_t tmp = 0; >>> + >>> + tmp = find_first_zero_bit(kiq->used, >>> + kiq->num_slot); >>> + if (tmp < kiq->num_slot) { >>> + __set_bit(tmp, kiq->used); >>> + *offset = kiq->reg_val_offs[tmp]; >>> + } >>> +} >>> + >>> +void amdgpu_gfx_kiq_reg_offs_free(struct amdgpu_kiq *kiq, >>> + uint32_t offset) >>> +{ >>> + uint32_t tmp = 0; >>> + >>> + tmp = (offset - kiq->reg_val_offs[0]) >> 3; >>> + if (tmp < kiq->num_slot) >>> + __clear_bit(tmp, kiq->used); >>> +} >>> + >>> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read) { >>> + int retry = MAX_KIQ_REG_TRY; >>> + >>> + BUG_ON(in_interrupt()); >>> + while (retry-- > 0) { >>> + spin_lock(&kiq->ring_lock); >>> + if ((atomic64_read(&kiq->submit_avail) > 0) && >>> + (read ? find_first_zero_bit(kiq->used, kiq->num_slot) < >>> + kiq->num_slot : 1)) { >>> + break; >>> + } else { >>> + spin_unlock(&kiq->ring_lock); >>> + msleep(2); >>> + continue; >>> + } >>> + >>> + } >>> + >>> + if (retry > 0) >>> + return 0; >>> + else >>> + return -ETIME; >>> +} >>> + >>> +void amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq) { >>> + spin_unlock(&kiq->ring_lock); >>> +} >>> + >>> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t *offs) { >>> + atomic64_dec(&kiq->submit_avail); >>> + if (offs) >>> + amdgpu_gfx_kiq_reg_offs_get(kiq, offs); >>> + >>> +} >>> + >>> +void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, uint32_t *offs) { >>> + atomic64_inc(&kiq->submit_avail); >>> + if (offs) >>> + amdgpu_gfx_kiq_reg_offs_free(kiq, *offs); } >>> + >>> int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev, >>> struct amdgpu_ring *ring, >>> struct amdgpu_irq_src *irq) >>> { >>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> - int r = 0; >>> + int r = 0, i, j; >>> >>> spin_lock_init(&kiq->ring_lock); >>> >>> - r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs); >>> - if (r) >>> - return r; >>> + for (i = 0; i < MAX_KIQ_RREG_NUM; i++) { >>> + r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs[i]); >>> + if (r) { >>> + for (j = 0; j < i; j++) >>> + amdgpu_device_wb_free(adev, >>> kiq->reg_val_offs[j]); >>> + return r; >>> + } >>> + } >>> >>> ring->adev = NULL; >>> ring->ring_obj = NULL; >>> ring->use_doorbell = true; >>> ring->doorbell_index = adev->doorbell_index.kiq; >>> + kiq->num_slot = MAX_KIQ_RREG_NUM; >>> + memset(&kiq->used, 0, sizeof(kiq->used)); >>> + >>> >>> r = amdgpu_gfx_kiq_acquire(adev, ring); >>> if (r) >>> @@ -325,13 +401,20 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device >>> *adev, >>> AMDGPU_RING_PRIO_DEFAULT); >>> if (r) >>> dev_warn(adev->dev, "(%d) failed to init kiq ring\n", >>> r); >>> + else >>> + atomic64_set(&kiq->submit_avail, ring->ring_size / 4 / >>> + (ring->funcs->align_mask + 1)); >>> >>> return r; >>> } >>> >>> void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring) { >>> - amdgpu_device_wb_free(ring->adev, ring->adev->gfx.kiq.reg_val_offs); >>> + struct amdgpu_kiq *kiq = &ring->adev->gfx.kiq; >>> + int i = 0; >>> + >>> + for (i = 0; i < MAX_KIQ_RREG_NUM; i++) >>> + amdgpu_device_wb_free(ring->adev, kiq->reg_val_offs[i]); >>> amdgpu_ring_fini(ring); >>> } >>> >>> @@ -671,19 +754,24 @@ int amdgpu_gfx_cp_ecc_error_irq(struct amdgpu_device >>> *adev, uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg) >>> { >>> signed long r, cnt = 0; >>> - unsigned long flags; >>> - uint32_t seq; >>> + uint32_t seq, reg_val_offs = 0, value = 0; >>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> struct amdgpu_ring *ring = &kiq->ring; >>> >>> BUG_ON(!ring->funcs->emit_rreg); >>> >>> - spin_lock_irqsave(&kiq->ring_lock, flags); >>> + r = amdgpu_gfx_kiq_lock(kiq, true); >>> + if (r) { >>> + pr_err("failed to lock kiq\n"); >>> + goto kiq_read_exit; >>> + } >>> + >>> + amdgpu_gfx_kiq_consume(kiq, ®_val_offs); >>> amdgpu_ring_alloc(ring, 32); >>> - amdgpu_ring_emit_rreg(ring, reg); >>> + amdgpu_ring_emit_rreg(ring, reg, reg_val_offs); >>> amdgpu_fence_emit_polling(ring, &seq); >>> amdgpu_ring_commit(ring); >>> - spin_unlock_irqrestore(&kiq->ring_lock, flags); >>> + amdgpu_gfx_kiq_unlock(kiq); >>> >>> r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT); >>> >>> @@ -707,9 +795,14 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, >>> uint32_t reg) >>> if (cnt > MAX_KIQ_REG_TRY) >>> goto failed_kiq_read; >>> >>> - return adev->wb.wb[kiq->reg_val_offs]; >>> + mb(); >>> + value = adev->wb.wb[reg_val_offs]; >>> + amdgpu_gfx_kiq_restore(kiq, ®_val_offs); >>> + return value; >>> >>> failed_kiq_read: >>> + amdgpu_gfx_kiq_restore(kiq, ®_val_offs); >>> +kiq_read_exit: >>> pr_err("failed to read reg:%x\n", reg); >>> return ~0; >>> } >>> @@ -717,19 +810,24 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, >>> uint32_t reg) void amdgpu_kiq_wreg(struct amdgpu_device *adev, uint32_t >>> reg, uint32_t v) { >>> signed long r, cnt = 0; >>> - unsigned long flags; >>> uint32_t seq; >>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> struct amdgpu_ring *ring = &kiq->ring; >>> >>> BUG_ON(!ring->funcs->emit_wreg); >>> >>> - spin_lock_irqsave(&kiq->ring_lock, flags); >>> + r = amdgpu_gfx_kiq_lock(kiq, false); >>> + if (r) { >>> + pr_err("failed to lock kiq\n"); >>> + goto kiq_write_exit; >>> + } >>> + >>> + amdgpu_gfx_kiq_consume(kiq, NULL); >>> amdgpu_ring_alloc(ring, 32); >>> amdgpu_ring_emit_wreg(ring, reg, v); >>> amdgpu_fence_emit_polling(ring, &seq); >>> amdgpu_ring_commit(ring); >>> - spin_unlock_irqrestore(&kiq->ring_lock, flags); >>> + amdgpu_gfx_kiq_unlock(kiq); >>> >>> r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT); >>> >>> @@ -754,8 +852,11 @@ void amdgpu_kiq_wreg(struct amdgpu_device *adev, >>> uint32_t reg, uint32_t v) >>> if (cnt > MAX_KIQ_REG_TRY) >>> goto failed_kiq_write; >>> >>> + amdgpu_gfx_kiq_restore(kiq, NULL); >>> return; >>> >>> failed_kiq_write: >>> + amdgpu_gfx_kiq_restore(kiq, NULL); >>> +kiq_write_exit: >>> pr_err("failed to write reg:%x\n", reg); } diff --git >>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h >>> index 634746829024..bb05594c27e0 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h >>> @@ -96,6 +96,7 @@ struct kiq_pm4_funcs { >>> int invalidate_tlbs_size; >>> }; >>> >>> +#define MAX_KIQ_RREG_NUM 64 >>> struct amdgpu_kiq { >>> u64 eop_gpu_addr; >>> struct amdgpu_bo *eop_obj; >>> @@ -103,7 +104,10 @@ struct amdgpu_kiq { >>> struct amdgpu_ring ring; >>> struct amdgpu_irq_src irq; >>> const struct kiq_pm4_funcs *pmf; >>> - uint32_t reg_val_offs; >>> + uint32_t reg_val_offs[MAX_KIQ_RREG_NUM]; >>> + atomic64_t submit_avail; >>> + uint32_t num_slot; >>> + unsigned long used[DIV_ROUND_UP(MAX_KIQ_RREG_NUM, >>> BITS_PER_LONG)]; >>> }; >>> >>> /* >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>> index f61664ee4940..b0ea4f67a6e1 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>> @@ -181,7 +181,8 @@ struct amdgpu_ring_funcs { >>> void (*end_use)(struct amdgpu_ring *ring); >>> void (*emit_switch_buffer) (struct amdgpu_ring *ring); >>> void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t >>> flags); >>> - void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg); >>> + void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg, >>> + uint32_t reg_val_offs); >>> void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, >>> uint32_t val); >>> void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, >>> uint32_t val, uint32_t mask); @@ -265,7 >>> +266,7 @@ >>> struct amdgpu_ring { #define amdgpu_ring_emit_hdp_flush(r) >>> (r)->funcs->emit_hdp_flush((r)) #define amdgpu_ring_emit_switch_buffer(r) >>> (r)->funcs->emit_switch_buffer((r)) >>> #define amdgpu_ring_emit_cntxcntl(r, d) >>> (r)->funcs->emit_cntxcntl((r), (d)) -#define >>> amdgpu_ring_emit_rreg(r, >>> d) (r)->funcs->emit_rreg((r), (d)) >>> +#define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), >>> +(d), >>> +(o)) >>> #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), >>> (v)) #define amdgpu_ring_emit_reg_wait(r, d, v, m) >>> (r)->funcs->emit_reg_wait((r), (d), (v), (m)) #define >>> amdgpu_ring_emit_reg_write_reg_wait(r, d0, d1, v, m) >>> (r)->funcs->emit_reg_write_reg_wait((r), (d0), (d1), (v), (m)) @@ -308,6 >>> +309,7 @@ static inline void amdgpu_ring_write(struct amdgpu_ring *ring, >>> uint32_t v) { >>> if (ring->count_dw <= 0) >>> DRM_ERROR("amdgpu: writing more dwords to the ring than >>> expected!\n"); >>> + >>> ring->ring[ring->wptr++ & ring->buf_mask] = v; >>> ring->wptr &= ring->ptr_mask; >>> ring->count_dw--; >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>> index 8c10084f44ef..dd54ea802a9a 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c >>> @@ -56,13 +56,19 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct >>> amdgpu_device *adev, >>> unsigned long flags; >>> uint32_t seq; >>> >>> - spin_lock_irqsave(&kiq->ring_lock, flags); >>> + r = amdgpu_gfx_kiq_lock(kiq, false); >>> + if (r) { >>> + pr_err("failed to lock kiq\n"); >>> + goto failed_exit; >>> + } >>> + >>> + amdgpu_gfx_kiq_consume(kiq, NULL); >>> amdgpu_ring_alloc(ring, 32); >>> amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1, >>> ref, mask); >>> amdgpu_fence_emit_polling(ring, &seq); >>> amdgpu_ring_commit(ring); >>> - spin_unlock_irqrestore(&kiq->ring_lock, flags); >>> + amdgpu_gfx_kiq_unlock(kiq); >>> >>> r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT); >>> >>> @@ -80,9 +86,12 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct >>> amdgpu_device *adev, >>> if (cnt > MAX_KIQ_REG_TRY) >>> goto failed_kiq; >>> >>> + amdgpu_gfx_kiq_restore(kiq, NULL); >>> return; >>> >>> failed_kiq: >>> + amdgpu_gfx_kiq_restore(kiq, NULL); >>> +failed_exit: >>> pr_err("failed to write reg %x wait reg %x\n", reg0, reg1); } >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >>> index 0a03e2ad5d95..7853dbc3c8bd 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >>> @@ -7594,10 +7594,10 @@ static void gfx_v10_0_ring_emit_frame_cntl(struct >>> amdgpu_ring *ring, bool start, >>> amdgpu_ring_write(ring, v | FRAME_CMD(start ? 0 : 1)); } >>> >>> -static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, >>> uint32_t reg) >>> +static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t >>> reg, >>> + uint32_t reg_val_offs) >>> { >>> struct amdgpu_device *adev = ring->adev; >>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> >>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4)); >>> amdgpu_ring_write(ring, 0 | /* src: register*/ >>> @@ -7606,9 +7606,9 @@ static void gfx_v10_0_ring_emit_rreg(struct >>> amdgpu_ring *ring, uint32_t reg) >>> amdgpu_ring_write(ring, reg); >>> amdgpu_ring_write(ring, 0); >>> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr + >>> - kiq->reg_val_offs * 4)); >>> + reg_val_offs * 4)); >>> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr + >>> - kiq->reg_val_offs * 4)); >>> + reg_val_offs * 4)); >>> } >>> >>> static void gfx_v10_0_ring_emit_wreg(struct amdgpu_ring *ring, >>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>> index fc6c2f2bc76c..bdbd92d4fe45 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>> @@ -6383,10 +6383,10 @@ static void >>> gfx_v8_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne >>> ring->ring[offset] = (ring->ring_size >> 2) - offset + >>> cur; } >>> >>> -static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, >>> uint32_t reg) >>> +static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg, >>> + uint32_t reg_val_offs) >>> { >>> struct amdgpu_device *adev = ring->adev; >>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> >>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4)); >>> amdgpu_ring_write(ring, 0 | /* src: register*/ >>> @@ -6395,9 +6395,9 @@ static void gfx_v8_0_ring_emit_rreg(struct >>> amdgpu_ring *ring, uint32_t reg) >>> amdgpu_ring_write(ring, reg); >>> amdgpu_ring_write(ring, 0); >>> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr + >>> - kiq->reg_val_offs * 4)); >>> + reg_val_offs * 4)); >>> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr + >>> - kiq->reg_val_offs * 4)); >>> + reg_val_offs * 4)); >>> } >>> >>> static void gfx_v8_0_ring_emit_wreg(struct amdgpu_ring *ring, >>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>> index 84fcf842316d..f7dd1036986e 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>> @@ -4042,14 +4042,20 @@ static int gfx_v9_0_soft_reset(void *handle) >>> static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev) { >>> signed long r, cnt = 0; >>> - unsigned long flags; >>> - uint32_t seq; >>> + uint32_t seq, reg_val_offs; >>> + uint64_t value = 0; >>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> struct amdgpu_ring *ring = &kiq->ring; >>> >>> BUG_ON(!ring->funcs->emit_rreg); >>> >>> - spin_lock_irqsave(&kiq->ring_lock, flags); >>> + r = amdgpu_gfx_kiq_lock(kiq, true); >>> + if (r) { >>> + pr_err("failed to lock kiq\n"); >>> + goto failed_kiq_exit; >>> + } >>> + >>> + amdgpu_gfx_kiq_consume(kiq, ®_val_offs); >>> amdgpu_ring_alloc(ring, 32); >>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4)); >>> amdgpu_ring_write(ring, 9 | /* src: register*/ >>> @@ -4059,12 +4065,12 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct >>> amdgpu_device *adev) >>> amdgpu_ring_write(ring, 0); >>> amdgpu_ring_write(ring, 0); >>> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr + >>> - kiq->reg_val_offs * 4)); >>> + reg_val_offs * 4)); >>> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr + >>> - kiq->reg_val_offs * 4)); >>> + reg_val_offs * 4)); >>> amdgpu_fence_emit_polling(ring, &seq); >>> amdgpu_ring_commit(ring); >>> - spin_unlock_irqrestore(&kiq->ring_lock, flags); >>> + amdgpu_gfx_kiq_unlock(kiq); >>> >>> r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT); >>> >>> @@ -4088,10 +4094,14 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct >>> amdgpu_device *adev) >>> if (cnt > MAX_KIQ_REG_TRY) >>> goto failed_kiq_read; >>> >>> - return (uint64_t)adev->wb.wb[kiq->reg_val_offs] | >>> - (uint64_t)adev->wb.wb[kiq->reg_val_offs + 1 ] << 32ULL; >>> + value = (uint64_t)adev->wb.wb[reg_val_offs] | >>> + (uint64_t)adev->wb.wb[reg_val_offs + 1 ] << 32ULL; >>> >>> + amdgpu_gfx_kiq_restore(kiq, ®_val_offs); >>> + return value; >>> failed_kiq_read: >>> + amdgpu_gfx_kiq_restore(kiq, ®_val_offs); >>> +failed_kiq_exit: >>> pr_err("failed to read gpu clock\n"); >>> return ~0; >>> } >>> @@ -5482,10 +5492,10 @@ static void >>> gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne >>> ring->ring[offset] = (ring->ring_size>>2) - offset + >>> cur; } >>> >>> -static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, >>> uint32_t reg) >>> +static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg, >>> + uint32_t reg_val_offs) >>> { >>> struct amdgpu_device *adev = ring->adev; >>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> >>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4)); >>> amdgpu_ring_write(ring, 0 | /* src: register*/ >>> @@ -5494,9 +5504,9 @@ static void gfx_v9_0_ring_emit_rreg(struct >>> amdgpu_ring *ring, uint32_t reg) >>> amdgpu_ring_write(ring, reg); >>> amdgpu_ring_write(ring, 0); >>> amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr + >>> - kiq->reg_val_offs * 4)); >>> + reg_val_offs * 4)); >>> amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr + >>> - kiq->reg_val_offs * 4)); >>> + reg_val_offs * 4)); >>> } >>> >>> static void gfx_v9_0_ring_emit_wreg(struct amdgpu_ring *ring, >>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>> index 30b75d79efdb..6b4dc0ed4fff 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c >>> @@ -422,20 +422,28 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct >>> amdgpu_device *adev, >>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>> >>> if (amdgpu_emu_mode == 0 && ring->sched.ready) { >>> - spin_lock(&adev->gfx.kiq.ring_lock); >>> + r = amdgpu_gfx_kiq_lock(kiq, false); >>> + if (r) { >>> + pr_err("failed to lock kiq\n"); >>> + return -ETIME; >>> + } >>> + >>> /* 2 dwords flush + 8 dwords fence */ >>> amdgpu_ring_alloc(ring, kiq->pmf->invalidate_tlbs_size >>> + 8); >>> + amdgpu_gfx_kiq_consume(kiq, NULL); >>> kiq->pmf->kiq_invalidate_tlbs(ring, >>> pasid, flush_type, all_hub); >>> amdgpu_fence_emit_polling(ring, &seq); >>> amdgpu_ring_commit(ring); >>> - spin_unlock(&adev->gfx.kiq.ring_lock); >>> + amdgpu_gfx_kiq_unlock(kiq); >>> r = amdgpu_fence_wait_polling(ring, seq, >>> adev->usec_timeout); >>> if (r < 1) { >>> DRM_ERROR("wait for kiq fence error: %ld.\n", >>> r); >>> + amdgpu_gfx_kiq_restore(kiq, false); >>> return -ETIME; >>> } >>> >>> + amdgpu_gfx_kiq_restore(kiq, NULL); >>> return 0; >>> } >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> index fecdbc471983..6238702c80b4 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c >>> @@ -613,7 +613,13 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct >>> amdgpu_device *adev, >>> if (vega20_xgmi_wa) >>> ndw += kiq->pmf->invalidate_tlbs_size; >>> >>> - spin_lock(&adev->gfx.kiq.ring_lock); >>> + r = amdgpu_gfx_kiq_lock(kiq, false); >>> + if (r) { >>> + pr_err("failed to lock kiq\n"); >>> + return -ETIME; >>> + } >>> + >>> + amdgpu_gfx_kiq_consume(kiq, NULL); >>> /* 2 dwords flush + 8 dwords fence */ >>> amdgpu_ring_alloc(ring, ndw); >>> if (vega20_xgmi_wa) >>> @@ -623,13 +629,15 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct >>> amdgpu_device *adev, >>> pasid, flush_type, all_hub); >>> amdgpu_fence_emit_polling(ring, &seq); >>> amdgpu_ring_commit(ring); >>> - spin_unlock(&adev->gfx.kiq.ring_lock); >>> + amdgpu_gfx_kiq_unlock(kiq); >>> r = amdgpu_fence_wait_polling(ring, seq, >>> adev->usec_timeout); >>> if (r < 1) { >>> DRM_ERROR("wait for kiq fence error: %ld.\n", >>> r); >>> + amdgpu_gfx_kiq_restore(kiq, NULL); >>> return -ETIME; >>> } >>> >>> + amdgpu_gfx_kiq_restore(kiq, NULL); >>> return 0; >>> } >>> >>> -- >>> 2.17.1 >>> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx@lists.freedesktop.org >> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis >> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7C >> Yintian.Tao%40amd.com%7C16d5caeab344478a395808d7e69070d3%7C3dd8961fe4 >> 884e608e11a82d994e183d%7C0%7C0%7C637231380366556593&sdata=mRnRzlS >> lFufeHytPmCLjKlSon2V6SSSC%2Fxx9c55F2gU%3D&reserved=0 _______________________________________________ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx