Sure, that's fine Do you have particular suggestion for problem 2 ? how we avoid commands being overwritten before it's finished
_____________________________________ Monk Liu|GPU Virtualization Team |AMD -----Original Message----- From: Christian König <[email protected]> Sent: Monday, April 20, 2020 4:17 PM To: Liu, Monk <[email protected]>; Koenig, Christian <[email protected]>; Kuehling, Felix <[email protected]>; Tao, Yintian <[email protected]> Cc: [email protected] Subject: Re: [PATCH] drm/amdgpu: refine kiq read register > Yintian's patch has nothing to do with the result you mentioned .... the > command being overwritten by new initiated commands is a inherent bug, why > you put those two stuff together ? Yintian patch made the situation absolutely worse. Instead of a whole ring buffer wrap around to overwrite things (1024 dw at least) you now just need to use up 30 dw to trigger undefined behavior and most likely a lockup. And as Felix pointed out the patch even writes over the end of the ring buffer and can cause random corruption to whatever there is. > What about let Yintian to provide one patch to address all those two problem > ? so way what you worried about won't happen ? Yes, please do so. But please make also sure that the original patch is reverted before this starts to cause fallout from testers. Regards, Christian. Am 20.04.20 um 09:39 schrieb Liu, Monk: >>>> Instead of this crude hack please let us just allocate a fixed >>>> number of write back slots and use them round robin > It looks doable but really ugly compared with current patch ... and > more over there we are going to fix the second problem eventually > > What about let Yintian to provide one patch to address all those two problem > ? so way what you worried about won't happen ? > _____________________________________ > Monk Liu|GPU Virtualization Team |AMD > > > -----Original Message----- > From: Liu, Monk > Sent: Monday, April 20, 2020 3:37 PM > To: Koenig, Christian <[email protected]>; Kuehling, Felix > <[email protected]>; Tao, Yintian <[email protected]> > Cc: [email protected] > Subject: RE: [PATCH] drm/amdgpu: refine kiq read register > >>>> Previously we ended up with an invalid value in a concurrent register >>>> read, now the KIQs overwrites its own commands and most likely causes a >>>> hang or the hardware to execute something random. > Yintian's patch has nothing to do with the result you mentioned .... the > command being overwritten by new initiated commands is a inherent bug, why > you put those two stuff together ? > > > > _____________________________________ > Monk Liu|GPU Virtualization Team |AMD > > > -----Original Message----- > From: Koenig, Christian <[email protected]> > Sent: Monday, April 20, 2020 3:19 PM > To: Liu, Monk <[email protected]>; Kuehling, Felix > <[email protected]>; Tao, Yintian <[email protected]> > Cc: [email protected] > Subject: Re: [PATCH] drm/amdgpu: refine kiq read register > > Hi Monk, > >> Can we first get the first problem done ? > Please absolutely not! See the problem introduced here is quite worse than > the actual fix. > > Previously we ended up with an invalid value in a concurrent register read, > now the KIQs overwrites its own commands and most likely causes a hang or the > hardware to execute something random. > > Instead of this crude hack please let us just allocate a fixed number of > write back slots and use them round robin. Then we can make sure that we > don't have more than a fixed number of reads in flight at the same time as > well using the fence values. > > This should fix both problems at the same time and not introduce another > potential problematic hack. > > If this patch is already committed please revert it immediately. > > Regards, > Christian. > > Am 20.04.20 um 08:20 schrieb Liu, Monk: >> Christian >> >>>>> Well I was under the assumption that this is actually what is done here. >> If that is not the case the patch is a rather clear NAK. >> <<< >> >> There are two kinds of problems in the current KIQ reading reg, Yintian tend >> to fix one of them but not all ... >> >> The first problem is : >> During the sleep of the first KIQ reading, another KIQ reading initiated an >> the read back register value flushed the first readback value, thus the >> first reading will get the wrong result. >> This is the issue yintian's patch to address, by put the readback >> value not in a shared WB but in a chunk DW of command submit >> >> The second problem is: >> Since we don't utilize GPU scheduler for KIQ submit thus if the KIQ >> is busy with some commands then those unfinished commands maybe overwritten >> by a new command submit, and that's not the Problem yintian's patch tend to >> address. Felex pointed it out which is fine and we can use another patch to >> address it, I'm also planning and scoping it. >> >> The optional way is: >> 1) We use GPU scheduler to manage KIQ activity, and all jobs are >> submitted to KIQ through a IB, thus no overwritten will happen >> 2) we still skip gpu scheduler but always use IB to put jobs on KIQ, >> thus each JOB will occupy the fixed space/DW of RB, so we can avoid >> overwrite unfinished command >> >> We can discuss the second problem later >> >> Can we first get the first problem done ? thanks >> >> >> _____________________________________ >> Monk Liu|GPU Virtualization Team |AMD >> >> >> -----Original Message----- >> From: Christian König <[email protected]> >> Sent: Monday, April 20, 2020 1:03 AM >> To: Kuehling, Felix <[email protected]>; Tao, Yintian >> <[email protected]>; Liu, Monk <[email protected]> >> Cc: [email protected] >> Subject: Re: [PATCH] drm/amdgpu: refine kiq read register >> >> Am 17.04.20 um 17:39 schrieb Felix Kuehling: >>> Am 2020-04-17 um 2:53 a.m. schrieb Yintian Tao: >>>> According to the current kiq read register method, there will be >>>> race condition when using KIQ to read register if multiple clients >>>> want to read at same time just like the expample below: >>>> 1. client-A start to read REG-0 throguh KIQ 2. client-A poll the >>>> seqno-0 3. client-B start to read REG-1 through KIQ 4. client-B >>>> poll the seqno-1 5. the kiq complete these two read operation 6. >>>> client-A to read the register at the wb buffer and >>>> get REG-1 value >>>> >>>> Therefore, directly make kiq write the register value at the ring >>>> buffer then there will be no race condition for the wb buffer. >>>> >>>> v2: supply the read_clock and move the reg_val_offs back >>>> >>>> Signed-off-by: Yintian Tao <[email protected]> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 11 ++++------ >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 1 - >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 5 +++-- >>>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 14 +++++------- >>>> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 14 +++++------- >>>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 28 ++++++++++++------------ >>>> 6 files changed, 33 insertions(+), 40 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>>> index ea576b4260a4..4e1c0239e561 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c >>>> @@ -304,10 +304,6 @@ int amdgpu_gfx_kiq_init_ring(struct >>>> amdgpu_device *adev, >>>> >>>> spin_lock_init(&kiq->ring_lock); >>>> >>>> - r = amdgpu_device_wb_get(adev, &kiq->reg_val_offs); >>>> - if (r) >>>> - return r; >>>> - >>>> ring->adev = NULL; >>>> ring->ring_obj = NULL; >>>> ring->use_doorbell = true; >>>> @@ -331,7 +327,6 @@ int amdgpu_gfx_kiq_init_ring(struct >>>> amdgpu_device *adev, >>>> >>>> void amdgpu_gfx_kiq_free_ring(struct amdgpu_ring *ring) >>>> { >>>> - amdgpu_device_wb_free(ring->adev, ring->adev->gfx.kiq.reg_val_offs); >>>> amdgpu_ring_fini(ring); >>>> } >>>> >>>> @@ -675,12 +670,14 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, >>>> uint32_t reg) >>>> uint32_t seq; >>>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>>> struct amdgpu_ring *ring = &kiq->ring; >>>> + uint64_t reg_val_offs = 0; >>>> >>>> BUG_ON(!ring->funcs->emit_rreg); >>>> >>>> spin_lock_irqsave(&kiq->ring_lock, flags); >>>> amdgpu_ring_alloc(ring, 32); >>>> - amdgpu_ring_emit_rreg(ring, reg); >>>> + reg_val_offs = (ring->wptr & ring->buf_mask) + 30; >>> I think that should be (ring->wptr + 30) & ring->buf_mask. Otherwise >>> the reg_val_offset can be past the end of the ring. >>> >>> But that still leaves a problem if another command is submitted to >>> the KIQ before you read the returned reg_val from the ring. Your >>> reg_val can be overwritten by the new command and you get the wrong >>> result. Or the command can be overwritten with the reg_val, which >>> will most likely hang the CP. >>> >>> You could allocate space on the KIQ ring with a NOP command to >>> prevent that space from being overwritten by other commands. >> Well I was under the assumption that this is actually what is done here. >> If that is not the case the patch is a rather clear NAK. >> >> Regards, >> Christian. >> >>> Regards, >>> Felix >>> >>> >>>> + amdgpu_ring_emit_rreg(ring, reg, reg_val_offs); >>>> amdgpu_fence_emit_polling(ring, &seq); >>>> amdgpu_ring_commit(ring); >>>> spin_unlock_irqrestore(&kiq->ring_lock, flags); @@ -707,7 >>>> +704,7 @@ uint32_t amdgpu_kiq_rreg(struct amdgpu_device *adev, uint32_t >>>> reg) >>>> if (cnt > MAX_KIQ_REG_TRY) >>>> goto failed_kiq_read; >>>> >>>> - return adev->wb.wb[kiq->reg_val_offs]; >>>> + return ring->ring[reg_val_offs]; >>>> >>>> failed_kiq_read: >>>> pr_err("failed to read reg:%x\n", reg); diff --git >>>> a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h >>>> index 634746829024..ee698f0246d8 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h >>>> @@ -103,7 +103,6 @@ struct amdgpu_kiq { >>>> struct amdgpu_ring ring; >>>> struct amdgpu_irq_src irq; >>>> const struct kiq_pm4_funcs *pmf; >>>> - uint32_t reg_val_offs; >>>> }; >>>> >>>> /* >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>>> index f61664ee4940..a3d88f2aa9f4 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h >>>> @@ -181,7 +181,8 @@ struct amdgpu_ring_funcs { >>>> void (*end_use)(struct amdgpu_ring *ring); >>>> void (*emit_switch_buffer) (struct amdgpu_ring *ring); >>>> void (*emit_cntxcntl) (struct amdgpu_ring *ring, uint32_t >>>> flags); >>>> - void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg); >>>> + void (*emit_rreg)(struct amdgpu_ring *ring, uint32_t reg, >>>> + uint64_t reg_val_offs); >>>> void (*emit_wreg)(struct amdgpu_ring *ring, uint32_t reg, >>>> uint32_t val); >>>> void (*emit_reg_wait)(struct amdgpu_ring *ring, uint32_t reg, >>>> uint32_t val, uint32_t mask); @@ -265,7 >>>> +266,7 @@ >>>> struct amdgpu_ring { >>>> #define amdgpu_ring_emit_hdp_flush(r) (r)->funcs->emit_hdp_flush((r)) >>>> #define amdgpu_ring_emit_switch_buffer(r) >>>> (r)->funcs->emit_switch_buffer((r)) >>>> #define amdgpu_ring_emit_cntxcntl(r, d) >>>> (r)->funcs->emit_cntxcntl((r), (d)) -#define >>>> amdgpu_ring_emit_rreg(r, >>>> d) (r)->funcs->emit_rreg((r), (d)) >>>> +#define amdgpu_ring_emit_rreg(r, d, o) (r)->funcs->emit_rreg((r), >>>> +(d), (o)) >>>> #define amdgpu_ring_emit_wreg(r, d, v) (r)->funcs->emit_wreg((r), (d), >>>> (v)) >>>> #define amdgpu_ring_emit_reg_wait(r, d, v, m) >>>> (r)->funcs->emit_reg_wait((r), (d), (v), (m)) >>>> #define amdgpu_ring_emit_reg_write_reg_wait(r, d0, d1, v, m) >>>> (r)->funcs->emit_reg_write_reg_wait((r), (d0), (d1), (v), (m)) diff >>>> --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >>>> index 0a03e2ad5d95..7c9a5e440509 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c >>>> @@ -7594,21 +7594,19 @@ static void gfx_v10_0_ring_emit_frame_cntl(struct >>>> amdgpu_ring *ring, bool start, >>>> amdgpu_ring_write(ring, v | FRAME_CMD(start ? 0 : 1)); >>>> } >>>> >>>> -static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, >>>> uint32_t reg) >>>> +static void gfx_v10_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t >>>> reg, >>>> + uint64_t reg_val_offs) >>>> { >>>> - struct amdgpu_device *adev = ring->adev; >>>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>>> - >>>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4)); >>>> amdgpu_ring_write(ring, 0 | /* src: register*/ >>>> (5 << 8) | /* dst: memory */ >>>> (1 << 20)); /* write confirm */ >>>> amdgpu_ring_write(ring, reg); >>>> amdgpu_ring_write(ring, 0); >>>> - amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr + >>>> - kiq->reg_val_offs * 4)); >>>> - amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr + >>>> - kiq->reg_val_offs * 4)); >>>> + amdgpu_ring_write(ring, lower_32_bits(ring->gpu_addr + >>>> + reg_val_offs * 4)); >>>> + amdgpu_ring_write(ring, upper_32_bits(ring->gpu_addr + >>>> + reg_val_offs * 4)); >>>> } >>>> >>>> static void gfx_v10_0_ring_emit_wreg(struct amdgpu_ring *ring, >>>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>>> index fc6c2f2bc76c..8e7eee7838e0 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c >>>> @@ -6383,21 +6383,19 @@ static void >>>> gfx_v8_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, unsigne >>>> ring->ring[offset] = (ring->ring_size >> 2) - offset + >>>> cur; >>>> } >>>> >>>> -static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, >>>> uint32_t reg) >>>> +static void gfx_v8_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t >>>> reg, >>>> + uint64_t reg_val_offs) >>>> { >>>> - struct amdgpu_device *adev = ring->adev; >>>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>>> - >>>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4)); >>>> amdgpu_ring_write(ring, 0 | /* src: register*/ >>>> (5 << 8) | /* dst: memory */ >>>> (1 << 20)); /* write confirm */ >>>> amdgpu_ring_write(ring, reg); >>>> amdgpu_ring_write(ring, 0); >>>> - amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr + >>>> - kiq->reg_val_offs * 4)); >>>> - amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr + >>>> - kiq->reg_val_offs * 4)); >>>> + amdgpu_ring_write(ring, lower_32_bits(ring->gpu_addr + >>>> + reg_val_offs * 4)); >>>> + amdgpu_ring_write(ring, upper_32_bits(ring->gpu_addr + >>>> + reg_val_offs * 4)); >>>> } >>>> >>>> static void gfx_v8_0_ring_emit_wreg(struct amdgpu_ring *ring, >>>> uint32_t reg, diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>>> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>>> index 84fcf842316d..ff279b1f5c24 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c >>>> @@ -4046,11 +4046,13 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct >>>> amdgpu_device *adev) >>>> uint32_t seq; >>>> struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>>> struct amdgpu_ring *ring = &kiq->ring; >>>> + uint64_t reg_val_offs = 0; >>>> >>>> BUG_ON(!ring->funcs->emit_rreg); >>>> >>>> spin_lock_irqsave(&kiq->ring_lock, flags); >>>> amdgpu_ring_alloc(ring, 32); >>>> + reg_val_offs = (ring->wptr & ring->buf_mask) + 30; >>>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4)); >>>> amdgpu_ring_write(ring, 9 | /* src: register*/ >>>> (5 << 8) | /* dst: memory */ >>>> @@ -4058,10 +4060,10 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct >>>> amdgpu_device *adev) >>>> (1 << 20)); /* write confirm */ >>>> amdgpu_ring_write(ring, 0); >>>> amdgpu_ring_write(ring, 0); >>>> - amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr + >>>> - kiq->reg_val_offs * 4)); >>>> - amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr + >>>> - kiq->reg_val_offs * 4)); >>>> + amdgpu_ring_write(ring, lower_32_bits(ring->gpu_addr + >>>> + reg_val_offs * 4)); >>>> + amdgpu_ring_write(ring, upper_32_bits(ring->gpu_addr + >>>> + reg_val_offs * 4)); >>>> amdgpu_fence_emit_polling(ring, &seq); >>>> amdgpu_ring_commit(ring); >>>> spin_unlock_irqrestore(&kiq->ring_lock, flags); @@ -4088,8 >>>> +4090,8 @@ static uint64_t gfx_v9_0_kiq_read_clock(struct >>>> +amdgpu_device *adev) >>>> if (cnt > MAX_KIQ_REG_TRY) >>>> goto failed_kiq_read; >>>> >>>> - return (uint64_t)adev->wb.wb[kiq->reg_val_offs] | >>>> - (uint64_t)adev->wb.wb[kiq->reg_val_offs + 1 ] << 32ULL; >>>> + return (uint64_t)ring->ring[reg_val_offs] | >>>> + (uint64_t)ring->ring[reg_val_offs + 1 ] << 32ULL; >>>> >>>> failed_kiq_read: >>>> pr_err("failed to read gpu clock\n"); @@ -5482,21 +5484,19 @@ >>>> static void gfx_v9_0_ring_emit_patch_cond_exec(struct amdgpu_ring *ring, >>>> unsigne >>>> ring->ring[offset] = (ring->ring_size>>2) - offset + >>>> cur; >>>> } >>>> >>>> -static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, >>>> uint32_t reg) >>>> +static void gfx_v9_0_ring_emit_rreg(struct amdgpu_ring *ring, uint32_t >>>> reg, >>>> + uint64_t reg_val_offs) >>>> { >>>> - struct amdgpu_device *adev = ring->adev; >>>> - struct amdgpu_kiq *kiq = &adev->gfx.kiq; >>>> - >>>> amdgpu_ring_write(ring, PACKET3(PACKET3_COPY_DATA, 4)); >>>> amdgpu_ring_write(ring, 0 | /* src: register*/ >>>> (5 << 8) | /* dst: memory */ >>>> (1 << 20)); /* write confirm */ >>>> amdgpu_ring_write(ring, reg); >>>> amdgpu_ring_write(ring, 0); >>>> - amdgpu_ring_write(ring, lower_32_bits(adev->wb.gpu_addr + >>>> - kiq->reg_val_offs * 4)); >>>> - amdgpu_ring_write(ring, upper_32_bits(adev->wb.gpu_addr + >>>> - kiq->reg_val_offs * 4)); >>>> + amdgpu_ring_write(ring, lower_32_bits(ring->gpu_addr + >>>> + reg_val_offs * 4)); >>>> + amdgpu_ring_write(ring, upper_32_bits(ring->gpu_addr + >>>> + reg_val_offs * 4)); >>>> } >>>> >>>> static void gfx_v9_0_ring_emit_wreg(struct amdgpu_ring *ring, >>>> uint32_t reg, >>> _______________________________________________ >>> amd-gfx mailing list >>> [email protected] >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli >>> s >>> t >>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7C >>> m >>> o >>> nk.liu%40amd.com%7Cadc62a682782487cc88f08d7e4838ff3%7C3dd8961fe4884e >>> 6 >>> 0 >>> 8e11a82d994e183d%7C0%7C0%7C637229126019563305&sdata=fK5riNpxfUFX >>> G >>> J >>> YjFzfkMETFJX6s6rntpu7CdjRhFDU%3D&reserved=0 > _______________________________________________ > amd-gfx mailing list > [email protected] > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flist > s.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=02%7C01%7CMo > nk.Liu%40amd.com%7C3497588e2bd540d132ec08d7e5032cb7%7C3dd8961fe4884e60 > 8e11a82d994e183d%7C0%7C0%7C637229674118038126&sdata=O4z6sIwD%2FLH3 > RSk3bcBRi3RHYLhTeV0An59xBJwqqEQ%3D&reserved=0 _______________________________________________ amd-gfx mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/amd-gfx
