amdgpu: Comply with implicit fencing rules

Christian König Sat, 22 May 2021 01:30:36 -0700

Am 21.05.21 um 20:31 schrieb Daniel Vetter:

[SNIP]

We could provide an IOCTL for the BO to change the flag.

That's not the semantics we need.

But could we first figure out the semantics we want to use here?

Cause I'm pretty sure we don't actually need those changes at all and as
said before I'm certainly NAKing things which break existing use cases.

Please read how other drivers do this and at least _try_ to understand
it. I'm really loosing my patience here with you NAKing patches you're
not even understanding (or did you actually read and fully understand
the entire story I typed up here, and your NAK is on the entire
thing?). There's not much useful conversation to be had with that
approach. And with drivers I mean kernel + userspace here.

Well to be honest I did fully read that, but I was just to emotionallyattached to answer more appropriately in that moment.

And I'm sorry that I react emotional on that, but it is reallyfrustrating that I'm not able to convince you that we have a majorproblem which affects all drivers and not just amdgpu.

Regarding the reason why I'm NAKing this particular patch, you arebreaking existing uAPI for RADV with that. And as a maintainer of thedriver I have simply no other choice than saying halt, stop we can't doit like this.

I'm perfectly aware that I've some holes in the understanding of how ANVor other Vulkan/OpenGL stacks work. But you should probably also admitthat you have some holes how amdgpu works or otherwise I can't imaginewhy you suggest a patch which simply breaks RADV.

I mean we are working together for years now and I think you know mepretty well, do you really think I scream bloody hell we can't do thiswithout a good reason?

So let's stop throwing halve backed solutions at each other and discusswhat we can do to solve the different problems we are both seeing here.

That's the other frustration part: You're trying to fix this purely in
the kernel. This is exactly one of these issues why we require open
source userspace, so that we can fix the issues correctly across the
entire stack. And meanwhile you're steadfastily refusing to even look
at that the userspace side of the picture.

Well I do fully understand the userspace side of the picture for the AMDstack. I just don't think we should give userspace that much controlover the fences in the dma_resv object without untangling them fromresource management.

And RADV is exercising exclusive sync for amdgpu already. You can dosubmission to both the GFX, Compute and SDMA queues in Vulkan and thosecurrently won't over-synchronize.

When you then send a texture generated by multiple engines to theCompositor the kernel will correctly inserts waits for all submissionsof the other process.

So this already works for RADV and completely without the IOCTL Jasonproposed. IIRC we also have unit tests which exercised that feature forthe video decoding use case long before RADV even existed.

And yes I have to admit that I haven't thought about interaction withother drivers when I came up with this because the rules of thatinteraction wasn't clear to me at that time.

Also I thought through your tlb issue, why are you even putting these
tlb flush fences into the shard dma_resv slots? If you store them
somewhere else in the amdgpu private part, the oversync issues goes
away
- in your ttm bo move callback, you can just make your bo copy job
depend on them too (you have to anyway)
- even for p2p there's not an issue here, because you have the
->move_notify callback, and can then lift the tlb flush fences from
your private place to the shared slots so the exporter can see them.

Because adding a shared fence requires that this shared fence signalsafter the exclusive fence. And this is a perfect example to explain whythis is so problematic and also why why we currently stumble over thatonly in amdgpu.

In TTM we have a feature which allows evictions to be pipelined anddon't wait for the evicting DMA operation. Without that driver willstall waiting for their allocations to finish when we need to allocatememory.

For certain use cases this gives you a ~20% fps increase under memorypressure, so it is a really important feature.

This works by adding the fence of the last eviction DMA operation to BOswhen their backing store is newly allocated. That's what thettm_bo_add_move_fence() function you stumbled over is good for:https://elixir.bootlin.com/linux/v5.13-rc2/source/drivers/gpu/drm/ttm/ttm_bo.c#L692

Now the problem is it is possible that the application is terminatedbefore it can complete it's command submission. But since resourcemanagement only waits for the shared fences when there are some there isa chance that we free up memory while it is still in use.

Because of this we have some rather crude workarounds in amdgpu. Forexample IIRC we manual wait for any potential exclusive fence beforefreeing memory.

We could enable this feature for radeon and nouveau as well with an oneline change. But that would mean we need to maintain the workarounds forshortcomings of the dma_resv object design in those drivers as well.

To summarize I think that adding an unbound fence to protect an objectis a perfectly valid operation for resource management, but this isrestricted by the needs of implicit sync at the moment.

The kernel move fences otoh are a bit more nasty to wring through the
p2p dma-buf interface. That one probably needs something new.


Well the p2p interface are my least concern.

Adding the move fence means that you need to touch every place we do CSor page flip since you now have something which is parallel to theexplicit sync fence.

Otherwise having the move fence separately wouldn't make much sense inthe first place if we always set it together with the exclusive fence.


Best regards and sorry for getting on your nerves so much,
Christian.

-Daniel

Regards,
Christian.

-Daniel

Are you bored enough to type this up for radv? I'll give Jason's kernel
stuff another review meanwhile.
-Daniel

                  e->bo_va = amdgpu_vm_bo_find(vm, bo);
          }
--
2.31.0

--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7Cf0852f38c85046ca877908d91c86a719%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637572186953277692%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=Vgz%2FkXFH4CD6ktZBnxnXFhHTG5tHhN1%2BDyf7pmxak6c%3D&amp;reserved=0


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules

Reply via email to