amdgpu: Guard against write accesses after device removal

Andrey Grodzovsky Fri, 05 Feb 2021 08:22:57 -0800

Daniel, ping. Also, please refer to the other thread with Bjorn from pci-dev
on the same topic I added you to.


Andrey

On 1/29/21 2:25 PM, Christian König wrote:

Am 29.01.21 um 18:35 schrieb Andrey Grodzovsky:
On 1/29/21 10:16 AM, Christian König wrote:
Am 28.01.21 um 18:23 schrieb Andrey Grodzovsky:
On 1/19/21 1:59 PM, Christian König wrote:
Am 19.01.21 um 19:22 schrieb Andrey Grodzovsky:
On 1/19/21 1:05 PM, Daniel Vetter wrote:
[SNIP]
So say writing in a loop to some harmless scratch register for many timesboth for plugged
and unplugged case and measure total time delta ?
I think we should at least measure the following:

1. Writing X times to a scratch reg without your patch.
2. Writing X times to a scratch reg with your patch.
3. Writing X times to a scratch reg with the hardware physically disconnected.
I suggest to repeat that once for Polaris (or older) and once for Vega orNavi.
The SRBM on Polaris is meant to introduce some delay in each access, so itmight react differently then the newer hardware.
Christian.
See attached results and the testing code. Ran on Polaris (gfx8) andVega10(gfx9)
In summary, over 1 million WWREG32 in loop with and without this patch youget around 10ms of accumulated overhead ( so 0.00001 millisecond penalty foreach WWREG32) for using drm_dev_enter check when writing registers.
P.S Bullet 3 I cannot test as I need eGPU and currently I don't have one.
Well if I'm not completely mistaken that are 100ms of accumulated overhead.So around 100ns per write. And even bigger problem is that this is a ~67%increase.
My bad, and 67% from what ? How u calculate ?
My bad, (308501-209689)/209689=47% increase.
I'm not sure how many write we do during normal operation, but that soundslike a bit much. Ideas?
Well, u suggested to move the drm_dev_enter way up but as i see it the problemwith this is that it increase the chance of race where thedevice is extracted after we check for drm_dev_enter (there is also suchchance even when it's placed inside WWREG but it's lower).Earlier I propsed that instead of doing all those guards scattered all overthe code simply delay release of system memory pages and unreserve ofMMIO ranges to until after the device itself is gone after last drm devicereference is dropped. But Daniel opposes delaying MMIO ranges unreserve to after
PCI remove code because according to him it will upset the PCI subsytem.
Yeah, that's most likely true as well.

Maybe Daniel has another idea when he's back from vacation.

Christian.
Andrey
Christian.
_______________________________________________
amd-gfx mailing list
amd-...@lists.freedesktop.org
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Famd-gfx&data=04%7C01%7CAndrey.Grodzovsky%40amd.com%7C7e63c7ba9ac44d80163108d8c48b9507%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637475451078731703%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=SozIYYmHpkk%2B4PRycs8T7x1DYagThy6lQoFXV5Ddamk%3D&reserved=0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Re: [PATCH v4 11/14] drm/amdgpu: Guard against write accesses after device removal

Reply via email to