Hi  Christian

Please see inline comments

-----Original Message-----
From: Koenig, Christian <christian.koe...@amd.com> 
Sent: 2020年4月22日 15:54
To: Tao, Yintian <yintian....@amd.com>; Liu, Monk <monk....@amd.com>; Liu, 
Shaoyun <shaoyun....@amd.com>; Kuehling, Felix <felix.kuehl...@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: refine kiq access register

Am 22.04.20 um 09:49 schrieb Tao, Yintian:
> Hi Christian
>
>
> Please see inline comments.
> -----Original Message-----
> From: Christian König <ckoenig.leichtzumer...@gmail.com>
> Sent: 2020年4月22日 15:40
> To: Tao, Yintian <yintian....@amd.com>; Koenig, Christian 
> <christian.koe...@amd.com>; Liu, Monk <monk....@amd.com>; Liu, Shaoyun 
> <shaoyun....@amd.com>; Kuehling, Felix <felix.kuehl...@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: refine kiq access register
>
> Am 22.04.20 um 09:35 schrieb Tao, Yintian:
>> Hi  Christian
>>
>>
>>> BUG_ON(in_interrupt());
>> That won't work like this. The KIQ is also used in interrupt context in the 
>> driver, that's why we used spin_lock_irqsave().
>> [yttao]: According to the current drm-next code, I have not find where to 
>> access register through KIQ.
>>              And you need to wait for the free kiq ring buffer space if 
>> there is no free kiq ring buffer, here, you wait at interrupt context is 
>> illegal.
> Waiting in atomic context is illegal as well, but we don't have much other 
> choice.
> [yttao]: no, there is no sleep in atomic context at my patch.

I'm not talking about a sleeping, but busy waiting.

> We just need to make sure that waiting never happens by making the buffers 
> large enough and if it still happens print and error.
> [yttao]: this is not the good choice because KMD need to protect it instead 
> of hoping user not frequently invoke KIQ acess.

The only other choice we have is busy waiting, e.g. loop until we get a free 
slot.
[yttao]: Yes, now may patch use msleep() to busy waiting. Or you means need to 
use udelay()? If we use udelay(), it will be the nightmare under multi-VF.
                Because it is assumed that there are 16VF within world-switch 
6ms, the bad situation is that one VF will udelay(16*6ms = 96ms) to get one 
free slot.


Regards,
Christian.

>
>> And I would either say that we should use the trick with the NOP to reserve 
>> space on the ring buffer or call amdgpu_device_wb_get() for each read. 
>> amdgpu_device_wb_get() also uses find_first_zero_bit() and should work 
>> equally well.
>> [yttao]: sorry, can you give me more details about how to use NOP to reserve 
>> space? I will use amdgpu_device_wb_get() for the read operation.
> We could use the NOP PM4 command as Felix suggested, this command has 
> a
> header+length and says that the next X dw should be ignore on the ring
> buffer.
>
> But I think using amdgpu_device_wb_get() is better anyway.
> [yttao]: yes, I agreed with amdgpu_device_wb_get() method because it 
> will fix prevent potential read race condition but NOP method will not 
> prevent it
>
> Regards,
> Christian.
>
>>
>>
>> -----Original Message-----
>> From: Koenig, Christian <christian.koe...@amd.com>
>> Sent: 2020年4月22日 15:23
>> To: Tao, Yintian <yintian....@amd.com>; Liu, Monk <monk....@amd.com>; 
>> Liu, Shaoyun <shaoyun....@amd.com>; Kuehling, Felix 
>> <felix.kuehl...@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Subject: Re: [PATCH] drm/amdgpu: refine kiq access register
>>
>>> BUG_ON(in_interrupt());
>> That won't work like this. The KIQ is also used in interrupt context in the 
>> driver, that's why we used spin_lock_irqsave().
>>
>> And I would either say that we should use the trick with the NOP to reserve 
>> space on the ring buffer or call amdgpu_device_wb_get() for each read. 
>> amdgpu_device_wb_get() also uses find_first_zero_bit() and should work 
>> equally well.
>>
>> You also don't need to worry to much about overflowing the wb area.
>> Since we run in an atomic context we can have at most the number of CPU in 
>> the system + interrupt context here.
>>
>> Regards,
>> Christian.
>>
>> Am 22.04.20 um 09:11 schrieb Tao, Yintian:
>>> Add Felix and Shaoyun
>>>
>>> -----Original Message-----
>>> From: Yintian Tao <yt...@amd.com>
>>> Sent: 2020年4月22日 12:42
>>> To: Koenig, Christian <christian.koe...@amd.com>; Liu, Monk 
>>> <monk....@amd.com>
>>> Cc: amd-gfx@lists.freedesktop.org; Tao, Yintian 
>>> <yintian....@amd.com>
>>> Subject: [PATCH] drm/amdgpu: refine kiq access register
>>>
>>> According to the current kiq access register method, there will be race 
>>> condition when using KIQ to read register if multiple clients want to read 
>>> at same time just like the expample below:
>>> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the seqno-0 3. 
>>> client-B start to read REG-1 through KIQ 4. client-B poll the seqno-1 5. 
>>> the kiq complete these two read operation 6. client-A to read the register 
>>> at the wb buffer and
>>>       get REG-1 value
>>>
>>> And if there are multiple clients to frequently write registers through KIQ 
>>> which may raise the KIQ ring buffer overwritten problem.
>>>
>>> Therefore, allocate fixed number wb slot for rreg use and limit the submit 
>>> number which depends on the kiq ring_size in order to prevent the 
>>> overwritten problem.
>>>
>>> Signed-off-by: Yintian Tao <yt...@amd.com>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   7 +-
>>>     .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c    |  12 +-
>>>     .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |  12 +-
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c       | 129 ++++++++++++++++--
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h       |   6 +-
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h      |   6 +-
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c      |  13 +-
>>>     drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c        |   8 +-
>>>     drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c         |   8 +-
>>>     drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c         |  34 +++--
>>>     drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c        |  12 +-
>>>     drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c         |  12 +-
>>>     12 files changed, 211 insertions(+), 48 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> index 4e1d4cfe7a9f..4530e0de4257 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
>>> @@ -526,7 +526,7 @@ static inline void amdgpu_set_ib_value(struct 
>>> amdgpu_cs_parser *p,
>>>     /*
>>>      * Writeback
>>>      */
>>> -#define AMDGPU_MAX_WB 128  /* Reserve at most 128 WB slots for 
>>> amdgpu-owned rings. */
>>> +#define AMDGPU_MAX_WB 256  /* Reserve at most 256 WB slots for 
>>> amdgpu-owned rings. */
>>>     
>>>     struct amdgpu_wb {
>>>             struct amdgpu_bo        *wb_obj;
>>> @@ -1028,6 +1028,11 @@ bool amdgpu_device_has_dc_support(struct
>>> amdgpu_device *adev);
>>>     
>>>     int emu_soc_asic_init(struct amdgpu_device *adev);
>>>     
>>> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read); void 
>>> +amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq);
>>> +
>>> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t 
>>> +*offs); void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, 
>>> +uint32_t *offs);
>>>     /*
>>>      * Registers read & write functions.
>>>      */
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
>>> index 691c89705bcd..034c9f416499 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v10.c
>>> @@ -309,6 +309,7 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void 
>>> *mqd,
>>>                                 uint32_t doorbell_off)
>>>     {
>>>             struct amdgpu_device *adev = get_amdgpu_device(kgd);
>>> +   struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>             struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring;
>>>             struct v10_compute_mqd *m;
>>>             uint32_t mec, pipe;
>>> @@ -324,13 +325,19 @@ static int kgd_hiq_mqd_load(struct kgd_dev *kgd, void 
>>> *mqd,
>>>             pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
>>>                      mec, pipe, queue_id);
>>>     
>>> -   spin_lock(&adev->gfx.kiq.ring_lock);
>>> +   r = amdgpu_gfx_kiq_lock(kiq, false);
>>> +   if (r) {
>>> +           pr_err("failed to lock kiq\n");
>>> +           goto out_unlock;
>>> +   }
>>> +
>>>             r = amdgpu_ring_alloc(kiq_ring, 7);
>>>             if (r) {
>>>                     pr_err("Failed to alloc KIQ (%d).\n", r);
>>>                     goto out_unlock;
>>>             }
>>>     
>>> +   amdgpu_gfx_kiq_consume(kiq, NULL);
>>>             amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5));
>>>             amdgpu_ring_write(kiq_ring,
>>>                               PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* 
>>> Queue_Sel */ @@ -350,8 +357,9 @@ static int kgd_hiq_mqd_load(struct kgd_dev 
>>> *kgd, void *mqd,
>>>             amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi);
>>>             amdgpu_ring_commit(kiq_ring);
>>>     
>>> +   amdgpu_gfx_kiq_restore(kiq, NULL);
>>>     out_unlock:
>>> -   spin_unlock(&adev->gfx.kiq.ring_lock);
>>> +   amdgpu_gfx_kiq_unlock(&adev->gfx.kiq);
>>>             release_queue(kgd);
>>>     
>>>             return r;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>>> index df841c2ac5e7..f243d9990ced 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
>>> @@ -307,6 +307,7 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void 
>>> *mqd,
>>>                                 uint32_t doorbell_off)
>>>     {
>>>             struct amdgpu_device *adev = get_amdgpu_device(kgd);
>>> +   struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>             struct amdgpu_ring *kiq_ring = &adev->gfx.kiq.ring;
>>>             struct v9_mqd *m;
>>>             uint32_t mec, pipe;
>>> @@ -322,13 +323,19 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev *kgd, void 
>>> *mqd,
>>>             pr_debug("kfd: set HIQ, mec:%d, pipe:%d, queue:%d.\n",
>>>                      mec, pipe, queue_id);
>>>     
>>> -   spin_lock(&adev->gfx.kiq.ring_lock);
>>> +   r = amdgpu_gfx_kiq_lock(kiq, false);
>>> +   if (r) {
>>> +           pr_err("failed to lock kiq\n");
>>> +           goto out_unlock;
>>> +   }
>>> +
>>>             r = amdgpu_ring_alloc(kiq_ring, 7);
>>>             if (r) {
>>>                     pr_err("Failed to alloc KIQ (%d).\n", r);
>>>                     goto out_unlock;
>>>             }
>>>     
>>> +   amdgpu_gfx_kiq_consume(kiq, NULL);
>>>             amdgpu_ring_write(kiq_ring, PACKET3(PACKET3_MAP_QUEUES, 5));
>>>             amdgpu_ring_write(kiq_ring,
>>>                               PACKET3_MAP_QUEUES_QUEUE_SEL(0) | /* 
>>> Queue_Sel */ @@ -348,8 +355,9 @@ int kgd_gfx_v9_hiq_mqd_load(struct kgd_dev 
>>> *kgd, void *mqd,
>>>             amdgpu_ring_write(kiq_ring, m->cp_hqd_pq_wptr_poll_addr_hi);
>>>             amdgpu_ring_commit(kiq_ring);
>>>     
>>> +   amdgpu_gfx_kiq_restore(kiq, NULL);
>>>     out_unlock:
>>> -   spin_unlock(&adev->gfx.kiq.ring_lock);
>>> +   amdgpu_gfx_kiq_unlock(&adev->gfx.kiq);
>>>             release_queue(kgd);
>>>     
>>>             return r;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> index ea576b4260a4..c6b1c77b5eac 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>>> @@ -295,23 +295,99 @@ static int amdgpu_gfx_kiq_acquire(struct 
>>> amdgpu_device *adev,
>>>             return -EINVAL;
>>>     }
>>>     
>>> +void amdgpu_gfx_kiq_reg_offs_get(struct amdgpu_kiq *kiq,
>>> +                            uint32_t *offset)
>>> +{
>>> +   uint32_t tmp = 0;
>>> +
>>> +   tmp = find_first_zero_bit(kiq->used,
>>> +                                 kiq->num_slot);
>>> +   if (tmp < kiq->num_slot) {
>>> +           __set_bit(tmp, kiq->used);
>>> +           *offset = kiq->reg_val_offs[tmp];
>>> +   }
>>> +}
>>> +
>>> +void amdgpu_gfx_kiq_reg_offs_free(struct amdgpu_kiq *kiq,
>>> +                            uint32_t offset)
>>> +{
>>> +   uint32_t tmp = 0;
>>> +
>>> +   tmp = (offset - kiq->reg_val_offs[0]) >> 3;
>>> +   if (tmp < kiq->num_slot)
>>> +           __clear_bit(tmp, kiq->used);
>>> +}
>>> +
>>> +int amdgpu_gfx_kiq_lock(struct amdgpu_kiq *kiq, bool read) {
>>> +   int retry = MAX_KIQ_REG_TRY;
>>> +
>>> +   BUG_ON(in_interrupt());
>>> +   while (retry-- > 0) {
>>> +           spin_lock(&kiq->ring_lock);
>>> +           if ((atomic64_read(&kiq->submit_avail) > 0) &&
>>> +               (read ? find_first_zero_bit(kiq->used, kiq->num_slot) <
>>> +                kiq->num_slot : 1)) {
>>> +                           break;
>>> +           } else {
>>> +                   spin_unlock(&kiq->ring_lock);
>>> +                   msleep(2);
>>> +                   continue;
>>> +           }
>>> +
>>> +   }
>>> +
>>> +   if (retry > 0)
>>> +           return 0;
>>> +   else
>>> +           return -ETIME;
>>> +}
>>> +
>>> +void amdgpu_gfx_kiq_unlock(struct amdgpu_kiq *kiq) {
>>> +   spin_unlock(&kiq->ring_lock);
>>> +}
>>> +
>>> +void amdgpu_gfx_kiq_consume(struct amdgpu_kiq *kiq, uint32_t *offs) {
>>> +   atomic64_dec(&kiq->submit_avail);
>>> +   if (offs)
>>> +           amdgpu_gfx_kiq_reg_offs_get(kiq, offs);
>>> +
>>> +}
>>> +
>>> +void amdgpu_gfx_kiq_restore(struct amdgpu_kiq *kiq, uint32_t *offs) {
>>> +   atomic64_inc(&kiq->submit_avail);
>>> +   if (offs)
>>> +           amdgpu_gfx_kiq_reg_offs_free(kiq, *offs); }
>>> +
>>>     int amdgpu_gfx_kiq_init_ring(struct amdgpu_device *adev,
>>>                                  struct amdgpu_ring *ring,
>>>                                  struct amdgpu_irq_src *irq)
>>>     {
>>>             struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>> -   int r = 0;
>>> +   int r = 0, i, j;
>>>     
>>>             spin_lock_init(&kiq->ring_lock);
>>>     
>>> -   r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs);
>>> -   if (r)
>>> -           return r;
>>> +   for (i = 0; i < MAX_KIQ_RREG_NUM; i++) {
>>> +           r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs[i]);
>>> +           if (r) {
>>> +                   for (j = 0; j < i; j++)
>>> +                           amdgpu_device_wb_free(adev, 
>>> kiq->reg_val_offs[j]);
>>> +                   return r;
>>> +           }
>>> +   }
>>>     
>>>             ring->adev = NULL;
>>>             ring->ring_obj = NULL;
>>>             ring->use_doorbell = true;
>>>             ring->doorbell_index = adev->doorbell_index.kiq;
>>> +   kiq->num_slot = MAX_KIQ_RREG_NUM;
>>> +   memset(&kiq->used, 0, sizeof(kiq->used));
>>> +
>>>     
>>>             r = amdgpu_gfx_kiq_acquire(adev, ring);
>>>             if (r)
>>> @@ -325,13 +401,20 @@ int amdgpu_gfx_kiq_init_ring(struct amdgpu_device 
>>> *adev,
>>>                                  AMDGPU_RING_PRIO_DEFAULT);
>>>             if (r)
>>>                     dev_warn(adev->dev, "(%d) failed to init kiq ring\n", 
>>> r);
>>> +   else
>>> +           atomic64_set(&kiq->submit_avail, ring->ring_size / 4 /
>>> +                        (ring->funcs->align_mask + 1));
>>>     
>>>             return r;
>>>     }
>>>     
>>>     void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring)  {
>>> -   amdgpu_device_wb_free(ring->adev, ring->adev->gfx.kiq.reg_val_offs);
>>> +   struct amdgpu_kiq *kiq = &ring->adev->gfx.kiq;
>>> +   int i = 0;
>>> +
>>> +   for (i = 0; i < MAX_KIQ_RREG_NUM; i++)
>>> +           amdgpu_device_wb_free(ring->adev, kiq->reg_val_offs[i]);
>>>             amdgpu_ring_fini(ring);
>>>     }
>>>     
>>> @@ -671,19 +754,24 @@ int amdgpu_gfx_cp_ecc_error_irq(struct amdgpu_device 
>>> *adev,  uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t reg)  
>>> {
>>>             signed long r, cnt = 0;
>>> -   unsigned long flags;
>>> -   uint32_t seq;
>>> +   uint32_t seq, reg_val_offs = 0, value = 0;
>>>             struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>             struct amdgpu_ring *ring = &kiq->ring;
>>>     
>>>             BUG_ON(!ring->funcs->emit_rreg);
>>>     
>>> -   spin_lock_irqsave(&kiq->ring_lock, flags);
>>> +   r = amdgpu_gfx_kiq_lock(kiq, true);
>>> +   if (r) {
>>> +           pr_err("failed to lock kiq\n");
>>> +           goto kiq_read_exit;
>>> +   }
>>> +
>>> +   amdgpu_gfx_kiq_consume(kiq, &reg_val_offs);
>>>             amdgpu_ring_alloc(ring, 32);
>>> -   amdgpu_ring_emit_rreg(ring, reg);
>>> +   amdgpu_ring_emit_rreg(ring, reg, reg_val_offs);
>>>             amdgpu_fence_emit_polling(ring, &seq);
>>>             amdgpu_ring_commit(ring);
>>> -   spin_unlock_irqrestore(&kiq->ring_lock, flags);
>>> +   amdgpu_gfx_kiq_unlock(kiq);
>>>     
>>>             r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>>>     
>>> @@ -707,9 +795,14 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, 
>>> uint32_t reg)
>>>             if (cnt > MAX_KIQ_REG_TRY)
>>>                     goto failed_kiq_read;
>>>     
>>> -   return adev->wb.wb[kiq->reg_val_offs];
>>> +   mb();
>>> +   value = adev->wb.wb[reg_val_offs];
>>> +   amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
>>> +   return value;
>>>     
>>>     failed_kiq_read:
>>> +   amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
>>> +kiq_read_exit:
>>>             pr_err("failed to read reg:%x\n", reg);
>>>             return ~0;
>>>     }
>>> @@ -717,19 +810,24 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, 
>>> uint32_t reg)  void amdgpu_kiq_wreg(struct amdgpu_device *adev, uint32_t 
>>> reg, uint32_t v)  {
>>>             signed long r, cnt = 0;
>>> -   unsigned long flags;
>>>             uint32_t seq;
>>>             struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>             struct amdgpu_ring *ring = &kiq->ring;
>>>     
>>>             BUG_ON(!ring->funcs->emit_wreg);
>>>     
>>> -   spin_lock_irqsave(&kiq->ring_lock, flags);
>>> +   r = amdgpu_gfx_kiq_lock(kiq, false);
>>> +   if (r) {
>>> +           pr_err("failed to lock kiq\n");
>>> +           goto kiq_write_exit;
>>> +   }
>>> +
>>> +   amdgpu_gfx_kiq_consume(kiq, NULL);
>>>             amdgpu_ring_alloc(ring, 32);
>>>             amdgpu_ring_emit_wreg(ring, reg, v);
>>>             amdgpu_fence_emit_polling(ring, &seq);
>>>             amdgpu_ring_commit(ring);
>>> -   spin_unlock_irqrestore(&kiq->ring_lock, flags);
>>> +   amdgpu_gfx_kiq_unlock(kiq);
>>>     
>>>             r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>>>     
>>> @@ -754,8 +852,11 @@ void amdgpu_kiq_wreg(struct amdgpu_device *adev, 
>>> uint32_t reg, uint32_t v)
>>>             if (cnt > MAX_KIQ_REG_TRY)
>>>                     goto failed_kiq_write;
>>>     
>>> +   amdgpu_gfx_kiq_restore(kiq, NULL);
>>>             return;
>>>     
>>>     failed_kiq_write:
>>> +   amdgpu_gfx_kiq_restore(kiq, NULL);
>>> +kiq_write_exit:
>>>             pr_err("failed to write reg:%x\n", reg);  } diff --git 
>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>>> index 634746829024..bb05594c27e0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
>>> @@ -96,6 +96,7 @@ struct kiq_pm4_funcs {
>>>             int invalidate_tlbs_size;
>>>     };
>>>     
>>> +#define MAX_KIQ_RREG_NUM 64
>>>     struct amdgpu_kiq {
>>>             u64                     eop_gpu_addr;
>>>             struct amdgpu_bo        *eop_obj;
>>> @@ -103,7 +104,10 @@ struct amdgpu_kiq {
>>>             struct amdgpu_ring      ring;
>>>             struct amdgpu_irq_src   irq;
>>>             const struct kiq_pm4_funcs *pmf;
>>> -   uint32_t                        reg_val_offs;
>>> +   uint32_t                reg_val_offs[MAX_KIQ_RREG_NUM];
>>> +   atomic64_t              submit_avail;
>>> +   uint32_t                num_slot;
>>> +   unsigned long           used[DIV_ROUND_UP(MAX_KIQ_RREG_NUM, 
>>> BITS_PER_LONG)];
>>>     };
>>>     
>>>     /*
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> index f61664ee4940..b0ea4f67a6e1 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h
>>> @@ -181,7 +181,8 @@ struct amdgpu_ring_funcs {
>>>             void (*end_use)(struct amdgpu_ring *ring);
>>>             void (*emit_switch_buffer) (struct amdgpu_ring *ring);
>>>             void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t 
>>> flags);
>>> -   void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg);
>>> +   void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg,
>>> +                     uint32_t reg_val_offs);
>>>             void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, 
>>> uint32_t val);
>>>             void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg,
>>>                                   uint32_t val, uint32_t mask); @@ -265,7 
>>> +266,7 @@ 
>>> struct amdgpu_ring {  #define amdgpu_ring_emit_hdp_flush(r) 
>>> (r)->funcs->emit_hdp_flush((r))  #define amdgpu_ring_emit_switch_buffer(r) 
>>> (r)->funcs->emit_switch_buffer((r))
>>>     #define amdgpu_ring_emit_cntxcntl(r, d) 
>>> (r)->funcs->emit_cntxcntl((r), (d)) -#define 
>>> amdgpu_ring_emit_rreg(r,
>>> d) (r)->funcs->emit_rreg((r), (d))
>>> +#define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), 
>>> +(d),
>>> +(o))
>>>     #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), 
>>> (v))  #define amdgpu_ring_emit_reg_wait(r, d, v, m) 
>>> (r)->funcs->emit_reg_wait((r), (d), (v), (m))  #define 
>>> amdgpu_ring_emit_reg_write_reg_wait(r, d0, d1, v, m) 
>>> (r)->funcs->emit_reg_write_reg_wait((r), (d0), (d1), (v), (m)) @@ -308,6 
>>> +309,7 @@ static inline void amdgpu_ring_write(struct amdgpu_ring *ring, 
>>> uint32_t v)  {
>>>             if (ring->count_dw <= 0)
>>>                     DRM_ERROR("amdgpu: writing more dwords to the ring than 
>>> expected!\n");
>>> +
>>>             ring->ring[ring->wptr++ & ring->buf_mask] = v;
>>>             ring->wptr &= ring->ptr_mask;
>>>             ring->count_dw--;
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>>> index 8c10084f44ef..dd54ea802a9a 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
>>> @@ -56,13 +56,19 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct 
>>> amdgpu_device *adev,
>>>             unsigned long flags;
>>>             uint32_t seq;
>>>     
>>> -   spin_lock_irqsave(&kiq->ring_lock, flags);
>>> +   r = amdgpu_gfx_kiq_lock(kiq, false);
>>> +   if (r) {
>>> +           pr_err("failed to lock kiq\n");
>>> +           goto failed_exit;
>>> +   }
>>> +
>>> +   amdgpu_gfx_kiq_consume(kiq, NULL);
>>>             amdgpu_ring_alloc(ring, 32);
>>>             amdgpu_ring_emit_reg_write_reg_wait(ring, reg0, reg1,
>>>                                                 ref, mask);
>>>             amdgpu_fence_emit_polling(ring, &seq);
>>>             amdgpu_ring_commit(ring);
>>> -   spin_unlock_irqrestore(&kiq->ring_lock, flags);
>>> +   amdgpu_gfx_kiq_unlock(kiq);
>>>     
>>>             r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>>>     
>>> @@ -80,9 +86,12 @@ void amdgpu_virt_kiq_reg_write_reg_wait(struct 
>>> amdgpu_device *adev,
>>>             if (cnt > MAX_KIQ_REG_TRY)
>>>                     goto failed_kiq;
>>>     
>>> +   amdgpu_gfx_kiq_restore(kiq, NULL);
>>>             return;
>>>     
>>>     failed_kiq:
>>> +   amdgpu_gfx_kiq_restore(kiq, NULL);
>>> +failed_exit:
>>>             pr_err("failed to write reg %x wait reg %x\n", reg0, reg1);  }
>>>     
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> index 0a03e2ad5d95..7853dbc3c8bd 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
>>> @@ -7594,10 +7594,10 @@ static void gfx_v10_0_ring_emit_frame_cntl(struct 
>>> amdgpu_ring *ring, bool start,
>>>             amdgpu_ring_write(ring, v | FRAME_CMD(start ? 0 : 1));  }
>>>     
>>> -static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, 
>>> uint32_t reg)
>>> +static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t 
>>> reg,
>>> +                                uint32_t reg_val_offs)
>>>     {
>>>             struct amdgpu_device *adev = ring->adev;
>>> -   struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>     
>>>             amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>>>             amdgpu_ring_write(ring, 0 |     /* src: register*/
>>> @@ -7606,9 +7606,9 @@ static void gfx_v10_0_ring_emit_rreg(struct 
>>> amdgpu_ring *ring, uint32_t reg)
>>>             amdgpu_ring_write(ring, reg);
>>>             amdgpu_ring_write(ring, 0);
>>>             amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>>> -                           kiq->reg_val_offs * 4));
>>> +                           reg_val_offs * 4));
>>>             amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>>> -                           kiq->reg_val_offs * 4));
>>> +                           reg_val_offs * 4));
>>>     }
>>>     
>>>     static void gfx_v10_0_ring_emit_wreg(struct amdgpu_ring *ring, 
>>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> index fc6c2f2bc76c..bdbd92d4fe45 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c
>>> @@ -6383,10 +6383,10 @@ static void 
>>> gfx_v8_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
>>>                     ring->ring[offset] = (ring->ring_size >> 2) - offset + 
>>> cur;  }
>>>     
>>> -static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, 
>>> uint32_t reg)
>>> +static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
>>> +                               uint32_t reg_val_offs)
>>>     {
>>>             struct amdgpu_device *adev = ring->adev;
>>> -   struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>     
>>>             amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>>>             amdgpu_ring_write(ring, 0 |     /* src: register*/
>>> @@ -6395,9 +6395,9 @@ static void gfx_v8_0_ring_emit_rreg(struct 
>>> amdgpu_ring *ring, uint32_t reg)
>>>             amdgpu_ring_write(ring, reg);
>>>             amdgpu_ring_write(ring, 0);
>>>             amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>>> -                           kiq->reg_val_offs * 4));
>>> +                           reg_val_offs * 4));
>>>             amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>>> -                           kiq->reg_val_offs * 4));
>>> +                           reg_val_offs * 4));
>>>     }
>>>     
>>>     static void gfx_v8_0_ring_emit_wreg(struct amdgpu_ring *ring, 
>>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> index 84fcf842316d..f7dd1036986e 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
>>> @@ -4042,14 +4042,20 @@ static int gfx_v9_0_soft_reset(void *handle)  
>>> static uint64_t gfx_v9_0_kiq_read_clock(struct amdgpu_device *adev)  {
>>>             signed long r, cnt = 0;
>>> -   unsigned long flags;
>>> -   uint32_t seq;
>>> +   uint32_t seq, reg_val_offs;
>>> +   uint64_t value = 0;
>>>             struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>             struct amdgpu_ring *ring = &kiq->ring;
>>>     
>>>             BUG_ON(!ring->funcs->emit_rreg);
>>>     
>>> -   spin_lock_irqsave(&kiq->ring_lock, flags);
>>> +   r = amdgpu_gfx_kiq_lock(kiq, true);
>>> +   if (r) {
>>> +           pr_err("failed to lock kiq\n");
>>> +           goto failed_kiq_exit;
>>> +   }
>>> +
>>> +   amdgpu_gfx_kiq_consume(kiq, &reg_val_offs);
>>>             amdgpu_ring_alloc(ring, 32);
>>>             amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>>>             amdgpu_ring_write(ring, 9 |     /* src: register*/
>>> @@ -4059,12 +4065,12 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct 
>>> amdgpu_device *adev)
>>>             amdgpu_ring_write(ring, 0);
>>>             amdgpu_ring_write(ring, 0);
>>>             amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>>> -                           kiq->reg_val_offs * 4));
>>> +                           reg_val_offs * 4));
>>>             amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>>> -                           kiq->reg_val_offs * 4));
>>> +                           reg_val_offs * 4));
>>>             amdgpu_fence_emit_polling(ring, &seq);
>>>             amdgpu_ring_commit(ring);
>>> -   spin_unlock_irqrestore(&kiq->ring_lock, flags);
>>> +   amdgpu_gfx_kiq_unlock(kiq);
>>>     
>>>             r = amdgpu_fence_wait_polling(ring, seq, MAX_KIQ_REG_WAIT);
>>>     
>>> @@ -4088,10 +4094,14 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct 
>>> amdgpu_device *adev)
>>>             if (cnt > MAX_KIQ_REG_TRY)
>>>                     goto failed_kiq_read;
>>>     
>>> -   return (uint64_t)adev->wb.wb[kiq->reg_val_offs] |
>>> -           (uint64_t)adev->wb.wb[kiq->reg_val_offs + 1 ] << 32ULL;
>>> +   value = (uint64_t)adev->wb.wb[reg_val_offs] |
>>> +           (uint64_t)adev->wb.wb[reg_val_offs + 1 ] << 32ULL;
>>>     
>>> +   amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
>>> +   return value;
>>>     failed_kiq_read:
>>> +   amdgpu_gfx_kiq_restore(kiq, &reg_val_offs);
>>> +failed_kiq_exit:
>>>             pr_err("failed to read gpu clock\n");
>>>             return ~0;
>>>     }
>>> @@ -5482,10 +5492,10 @@ static void 
>>> gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne
>>>                     ring->ring[offset] = (ring->ring_size>>2) - offset + 
>>> cur;  }
>>>     
>>> -static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, 
>>> uint32_t reg)
>>> +static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t reg,
>>> +                               uint32_t reg_val_offs)
>>>     {
>>>             struct amdgpu_device *adev = ring->adev;
>>> -   struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>     
>>>             amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4));
>>>             amdgpu_ring_write(ring, 0 |     /* src: register*/
>>> @@ -5494,9 +5504,9 @@ static void gfx_v9_0_ring_emit_rreg(struct 
>>> amdgpu_ring *ring, uint32_t reg)
>>>             amdgpu_ring_write(ring, reg);
>>>             amdgpu_ring_write(ring, 0);
>>>             amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr +
>>> -                           kiq->reg_val_offs * 4));
>>> +                           reg_val_offs * 4));
>>>             amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr +
>>> -                           kiq->reg_val_offs * 4));
>>> +                           reg_val_offs * 4));
>>>     }
>>>     
>>>     static void gfx_v9_0_ring_emit_wreg(struct amdgpu_ring *ring, 
>>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> index 30b75d79efdb..6b4dc0ed4fff 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c
>>> @@ -422,20 +422,28 @@ static int gmc_v10_0_flush_gpu_tlb_pasid(struct 
>>> amdgpu_device *adev,
>>>             struct amdgpu_kiq *kiq = &adev->gfx.kiq;
>>>     
>>>             if (amdgpu_emu_mode == 0 && ring->sched.ready) {
>>> -           spin_lock(&adev->gfx.kiq.ring_lock);
>>> +           r = amdgpu_gfx_kiq_lock(kiq, false);
>>> +           if (r) {
>>> +                   pr_err("failed to lock kiq\n");
>>> +                   return -ETIME;
>>> +           }
>>> +
>>>                     /* 2 dwords flush + 8 dwords fence */
>>>                     amdgpu_ring_alloc(ring, kiq->pmf->invalidate_tlbs_size 
>>> + 8);
>>> +           amdgpu_gfx_kiq_consume(kiq, NULL);
>>>                     kiq->pmf->kiq_invalidate_tlbs(ring,
>>>                                             pasid, flush_type, all_hub);
>>>                     amdgpu_fence_emit_polling(ring, &seq);
>>>                     amdgpu_ring_commit(ring);
>>> -           spin_unlock(&adev->gfx.kiq.ring_lock);
>>> +           amdgpu_gfx_kiq_unlock(kiq);
>>>                     r = amdgpu_fence_wait_polling(ring, seq, 
>>> adev->usec_timeout);
>>>                     if (r < 1) {
>>>                             DRM_ERROR("wait for kiq fence error: %ld.\n", 
>>> r);
>>> +                   amdgpu_gfx_kiq_restore(kiq, false);
>>>                             return -ETIME;
>>>                     }
>>>     
>>> +           amdgpu_gfx_kiq_restore(kiq, NULL);
>>>                     return 0;
>>>             }
>>>     
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> index fecdbc471983..6238702c80b4 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
>>> @@ -613,7 +613,13 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
>>> amdgpu_device *adev,
>>>                     if (vega20_xgmi_wa)
>>>                             ndw += kiq->pmf->invalidate_tlbs_size;
>>>     
>>> -           spin_lock(&adev->gfx.kiq.ring_lock);
>>> +           r = amdgpu_gfx_kiq_lock(kiq, false);
>>> +           if (r) {
>>> +                   pr_err("failed to lock kiq\n");
>>> +                   return -ETIME;
>>> +           }
>>> +
>>> +           amdgpu_gfx_kiq_consume(kiq, NULL);
>>>                     /* 2 dwords flush + 8 dwords fence */
>>>                     amdgpu_ring_alloc(ring, ndw);
>>>                     if (vega20_xgmi_wa)
>>> @@ -623,13 +629,15 @@ static int gmc_v9_0_flush_gpu_tlb_pasid(struct 
>>> amdgpu_device *adev,
>>>                                             pasid, flush_type, all_hub);
>>>                     amdgpu_fence_emit_polling(ring, &seq);
>>>                     amdgpu_ring_commit(ring);
>>> -           spin_unlock(&adev->gfx.kiq.ring_lock);
>>> +           amdgpu_gfx_kiq_unlock(kiq);
>>>                     r = amdgpu_fence_wait_polling(ring, seq, 
>>> adev->usec_timeout);
>>>                     if (r < 1) {
>>>                             DRM_ERROR("wait for kiq fence error: %ld.\n", 
>>> r);
>>> +                   amdgpu_gfx_kiq_restore(kiq, NULL);
>>>                             return -ETIME;
>>>                     }
>>>     
>>> +           amdgpu_gfx_kiq_restore(kiq, NULL);
>>>                     return 0;
>>>             }
>>>     
>>> --
>>> 2.17.1
>>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx@lists.freedesktop.org
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flis
>> ts.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&amp;data=02%7C01%7C
>> Yintian.Tao%40amd.com%7C16d5caeab344478a395808d7e69070d3%7C3dd8961fe4
>> 884e608e11a82d994e183d%7C0%7C0%7C637231380366556593&amp;sdata=mRnRzlS
>> lFufeHytPmCLjKlSon2V6SSSC%2Fxx9c55F2gU%3D&amp;reserved=0

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Reply via email to