To be more specific let me give you an example we face with now:
Although KMD now use a cache way to give libdrm the
CC_RB_BACKEND_DISABLE value, but KMD still need to read it at least
one shot during loading driver right ?
For SR-IOV there is still chance that VF2 is doing CPU based tlb flush
while VF1 is doing driver loading, so there are chance that tlb flush
of VF2 was ruined by
The driver loading (one shot CC_RB_BACKEND_DISABLE reading).
Not only tlb flush, other ring’s vm flush of VF2 can all be ruined by
this register’s reading from VF1, right ?
How you solve that ?
/Monk
*From:*amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] *On
Behalf Of *Liu, Monk
*Sent:* 2018年3月8日18:03
*To:* Koenig, Christian <christian.koe...@amd.com>; Deucher, Alexander
<alexander.deuc...@amd.com>; Mao, David <david....@amd.com>
*Cc:* Jin, Jian-Rong <jian-rong....@amd.com>;
amd-gfx@lists.freedesktop.org
*Subject:* RE: deprecated register issues
Yeah, I agree with you we’d better find all those registers
Stitch together the REQ and ACK part only avoid this issue for vm
flush only, and we may still hit the issue in other place, I knew it.
The frustrating job is how can we find all those registers ?
And more is since this CC_RB_BACKEND_DISABLE register is not
deprecated (confirmed with David M), and driver indeed nee to read it
How could we avoid this reading cause vm hub broken ?
I believe like you said there are a bunch of registers (not really
deprecated ) reading would cause vm hub broken, how could we still read
Them when we want but also not to trigger the world switch issue ?
looks to me there is no way to do that, even if you already find out
all of those registers, we still need to access them, so the world
switch (or other issues)
are still going to fail, and that’s why I think the plan B at least
have its reason to stand there.
any thought ?
/Monk
*From:*Christian König [mailto:ckoenig.leichtzumer...@gmail.com]
*Sent:* 2018年3月8日17:41
*To:* Liu, Monk <monk....@amd.com <mailto:monk....@amd.com>>; Deucher,
Alexander <alexander.deuc...@amd.com
<mailto:alexander.deuc...@amd.com>>; Koenig, Christian
<christian.koe...@amd.com <mailto:christian.koe...@amd.com>>; Mao,
David <david....@amd.com <mailto:david....@amd.com>>
*Cc:* Jin, Jian-Rong <jian-rong....@amd.com
<mailto:jian-rong....@amd.com>>; amd-gfx@lists.freedesktop.org
<mailto:amd-gfx@lists.freedesktop.org>
*Subject:* Re: deprecated register issues
Hi Monk,
While we can avoid such vm flush failure by stitch together of the
sending REQ and reading ACK part, at least for compute ring this
is confirmed.
Well there are two misunderstanding here.
First of all this solution doesn't really work, it just hides the
problem because we don't do a world switch in between those two
packets any more. And while we could change the SDMA, UVD and VCE
firmware do to something similar you can't apply this solution to CPU
based flushes.
The second issue is that this isn't related to VMHUB flushing at all,
it's just that VMHUB flushing is the first thing where you notice that
something is wrong.
The real problem is that when you access CC_RB_BACKEND_DISABLE and a
bunch of other registers the bus on Vega10 sometimes gets a hickup and
drops other reads and writes.
So we need to identify those registers and removes all accesses to
them, otherwise working with the hardware will just be horrible
unreliable in general.
Regards,
Christian.
Am 08.03.2018 um 04:05 schrieb Liu, Monk:
Hi Alex
While we can avoid such vm flush failure by stitch together of the
sending REQ and reading ACK part, at least for compute ring this
is confirmed.
And I believe for SDMA ring (even UVD/VCE ring) it could also be
achieved.
But @Koenig, Christian <mailto:christian.koe...@amd.com> insist
stitching together the REQ AND ACK part is not a formal way to fix
the issue, instead just a walkaround and I cannot debate that
What make me worry more is what if there are more registers like
Alex said that behaves like this CC_RB_BACKEND_DISABLE,
since we don’t know their names(too hard to filter them out!) so
we couldn’t remove them all from SR list,
So I still think we need plan B to handle above case, A.K.A use
one package for the REQ and ACK job
/Monk
*From:*Deucher, Alexander
*Sent:* 2018年3月8日10:53
*To:* Liu, Monk <monk....@amd.com> <mailto:monk....@amd.com>;
Koenig, Christian <christian.koe...@amd.com>
<mailto:christian.koe...@amd.com>; Mao, David <david....@amd.com>
<mailto:david....@amd.com>
*Cc:* amd-gfx@lists.freedesktop.org
<mailto:amd-gfx@lists.freedesktop.org>; Jin, Jian-Rong
<jian-rong....@amd.com> <mailto:jian-rong....@amd.com>
*Subject:* Re: deprecated register issues
I think there are more than just CC_RB_BACKEND_DISABLE that could
cause this problem. IIRC, some entire class of gfx registers could
cause it, it just happened that this was one of the only ones we
readback via mmio. Also for the save and restore list, I think
the RLC uses a different interface to read back the registers so
it may not be affected the same way.
Alex
------------------------------------------------------------------------
*From:*Liu, Monk
*Sent:* Wednesday, March 7, 2018 9:42:41 PM
*To:* Deucher, Alexander; Koenig, Christian; Mao, David
*Cc:* amd-gfx@lists.freedesktop.org
<mailto:amd-gfx@lists.freedesktop.org>; Jin, Jian-Rong
*Subject:* RE: deprecated register issues
Hi guys
According to Christian’s found, reading this register would make
vm hub failed to finish the vm flush request , e.g.: sdma is doing
vm flush which first write data to vm_invalidat_req and read
result from vm_invalidate_ack, but found driver will forever
failed to get the correct value from vm_invalidate_ack if the
meantime BIF is reading this CC_RB_BACKEND_DISABLE register.
Now SR-IOV world switch also may get such similar trouble, see
below save_restore_list ( during world_switch, RLCV will save
current VF’s register according to this list and restore all those
registers when loading back this VF)
uint32 register_restore[] = {
(uint32)((0x3000 << 18) | mmPA_SC_FIFO_SIZE), /* SC */
0x00000001,
*(uint32)((0x3000 << 18) | mmCC_RB_BACKEND_DISABLE), /* SC SC
PER_SE */*
*0x00000000,*
*(uint32)((0x3400 << 18) | mmCC_RB_BACKEND_DISABLE), /* SC SC
PER_SE */*
*0x00000000,*
*(uint32)((0x3800 << 18) | mmCC_RB_BACKEND_DISABLE), /* SC SC
PER_SE */*
*0x00000000,*
*(uint32)((0x3c00 << 18) | mmCC_RB_BACKEND_DISABLE), /* SC SC
PER_SE */*
*0x00000000,*
(uint32)((0x3000 << 18) | mmVGT_VTX_VECT_EJECT_REG),
0x00000001,
(uint32)((0x3000 << 18) | mmVGT_DMA_DATA_FIFO_DEPTH), /* IA WD */
0x00000001,
(uint32)((0x3000 << 18) | mmVGT_DMA_REQ_FIFO_DEPTH), /* WD */
0x00000001,
(uint32)((0x3000 << 18) | mmVGT_DRAW_INIT_FIFO_DEPTH), /* WD */
0x00000001,
(uint32)((0x3000 << 18) | mmVGT_CACHE_INVALIDATION), /* IA */
0x00000001,
(uint32)((0x3000 << 18) | mmVGT_RESET_DEBUG), /* WD */
0x00000001,
(uint32)((0x3000 << 18) | mmVGT_FIFO_DEPTHS),
I will do some test against this CC_RB_BACKEND_DISABLE register,
see if vm flush failure issue could be avoided by removing those
four register from SR list
Thanks
/Monk
*From:*Deucher, Alexander
*Sent:* 2018年3月7日23:13
*To:* Koenig, Christian <christian.koe...@amd.com
<mailto:christian.koe...@amd.com>>; Mao, David <david....@amd.com
<mailto:david....@amd.com>>; Liu, Monk <monk....@amd.com
<mailto:monk....@amd.com>>
*Cc:* amd-gfx@lists.freedesktop.org
<mailto:amd-gfx@lists.freedesktop.org>; Jin, Jian-Rong
<jian-rong....@amd.com <mailto:jian-rong....@amd.com>>
*Subject:* Re: deprecated register issues
Right. We ran into issues with reading back that register at
runtime when UMDs queried it when other stuff was in flight, so we
just read it once at startup and cache the results. Now when UMDs
request it, we return the cached value.
Alex
------------------------------------------------------------------------
*From:*Koenig, Christian
*Sent:* Wednesday, March 7, 2018 9:31:13 AM
*To:* Mao, David; Liu, Monk
*Cc:* Deucher, Alexander; amd-gfx@lists.freedesktop.org
<mailto:amd-gfx@lists.freedesktop.org>; Jin, Jian-Rong
*Subject:* Re: deprecated register issues
Hi David,
well I just figured that this is a misunderstanding.
Accessing this register and some other deprecated registers can
cause problem when invalidating VMHUBs.
This register itself isn't deprecated, the wording in a patch
fixing things is just a bit unclear.
Question is is that register still accessed regularly or is it
value cached after startup?
Regards,
Christian.
Am 07.03.2018 um 15:25 schrieb Mao, David:
We requires base driver to provide the mask of disabled RB.
This is why kernel read the CC_RB_BACKEND_DISABLE to collect
the harvest configuration.
Where did you get to know that the register is deprecated?
I think it should still be there.
Best Regards,
David
On Mar 7, 2018, at 9:49 PM, Liu, Monk <monk....@amd.com
<mailto:monk....@amd.com>> wrote:
+ UMD guys
Hi David
Do you know if*GC_USER_RB_BACKEND_DISABLE is still exist
for gfx9/vega10 ?*
**
*We found*CC_RB_BACKEND_DISABLE was deprecated but looks
it is still in use in kmd, so
I want to check with you both of above registers
Thanks
/Monk
*From:*amd-gfx
[mailto:amd-gfx-boun...@lists.freedesktop.org]*On Behalf
Of*Christian K?nig
*Sent:*2018年3月7日20:26
*To:*Liu, Monk <monk....@amd.com
<mailto:monk....@amd.com>>; Deucher, Alexander
<alexander.deuc...@amd.com <mailto:alexander.deuc...@amd.com>>
*Cc:*amd-gfx@lists.freedesktop.org
<mailto:amd-gfx@lists.freedesktop.org>
*Subject:*Re: deprecated register issues
Hi Monk,
I honestly don't have the slightest idea why we are still
accessing CC_RB_BACKEND_DISABLE. Maybe it still contains
some useful values?
Key point was that we needed to stop accessing it all the
time to avoid triggering problems.
Regards,
Christian.
Am 07.03.2018 um 13:11 schrieb Liu, Monk:
Hi Christian
I remember you and AlexD mentioned that a handful
registers are deprecated for greenland (gfx9)
e.g. CC_RB_BACKEND_DISABLE
do you know why we still have this routine ?
staticu32
gfx_v9_0_get_rb_active_bitmap(structamdgpu_device *adev)
{
u32 data, mask;
data =RREG32_SOC15(GC,
0, mmCC_RB_BACKEND_DISABLE);
data |=RREG32_SOC15(GC,
0, mmGC_USER_RB_BACKEND_DISABLE);
data &= CC_RB_BACKEND_DISABLE__BACKEND_DISABLE_MASK;
data >>=
GC_USER_RB_BACKEND_DISABLE__BACKEND_DISABLE__SHIFT;
mask
=amdgpu_gfx_create_bitmask(adev->gfx.config.max_backends_per_se/
adev->gfx.config.max_sh_per_se);
return(~data) & mask;
}
see that it still read CC_RB_BACKEND_DISABLE
thanks
/Monk
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org <mailto:amd-gfx@lists.freedesktop.org>
https://lists.freedesktop.org/mailman/listinfo/amd-gfx