[Mesa-dev] [PATCH 0/7] dma-buf: Add an API for exporting sync files (v11)

2021-05-25 Thread Jason Ekstrand
Modern userspace APIs like Vulkan are built on an explicit
synchronization model.  This doesn't always play nicely with the
implicit synchronization used in the kernel and assumed by X11 and
Wayland.  The client -> compositor half of the synchronization isn't too
bad, at least on intel, because we can control whether or not i915
synchronizes on the buffer and whether or not it's considered written.

The harder part is the compositor -> client synchronization when we get
the buffer back from the compositor.  We're required to be able to
provide the client with a VkSemaphore and VkFence representing the point
in time where the window system (compositor and/or display) finished
using the buffer.  With current APIs, it's very hard to do this in such
a way that we don't get confused by the Vulkan driver's access of the
buffer.  In particular, once we tell the kernel that we're rendering to
the buffer again, any CPU waits on the buffer or GPU dependencies will
wait on some of the client rendering and not just the compositor.

This new IOCTL solves this problem by allowing us to get a snapshot of
the implicit synchronization state of a given dma-buf in the form of a
sync file.  It's effectively the same as a poll() or I915_GEM_WAIT only,
instead of CPU waiting directly, it encapsulates the wait operation, at
the current moment in time, in a sync_file so we can check/wait on it
later.  As long as the Vulkan driver does the sync_file export from the
dma-buf before we re-introduce it for rendering, it will only contain
fences from the compositor or display.  This allows to accurately turn
it into a VkFence or VkSemaphore without any over- synchronization.

This patch series actually contains two new ioctls.  There is the export
one mentioned above as well as an RFC for an import ioctl which provides
the other half.  The intention is to land the export ioctl since it seems
like there's no real disagreement on that one.  The import ioctl, however,
has a lot of debate around it so it's intended to be RFC-only for now.

Mesa MR: https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4037
IGT tests: https://patchwork.freedesktop.org/series/90490/

v10 (Jason Ekstrand, Daniel Vetter):
 - Add reviews/acks
 - Add a patch to rename _rcu to _unlocked
 - Split things better so import is clearly RFC status

v11 (Daniel Vetter):
 - Add more CCs to try and get maintainers
 - Add a patch to document DMA_BUF_IOCTL_SYNC
 - Generally better docs
 - Use separate structs for import/export (easier to document)
 - Fix an issue in the import patch

Cc: Christian König 
Cc: Michel Dänzer 
Cc: Dave Airlie 
Cc: Bas Nieuwenhuizen 
Cc: Daniel Stone 
Cc: mesa-dev@lists.freedesktop.org
Cc: wayland-de...@lists.freedesktop.org
Test-with: 20210524205225.872316-1-ja...@jlekstrand.net

Christian König (1):
  dma-buf: Add dma_fence_array_for_each (v2)

Jason Ekstrand (6):
  dma-buf: Rename dma_resv helpers from _rcu to _unlocked (v2)
  dma-buf: Add dma_resv_get_singleton_unlocked (v5)
  dma-buf: Document DMA_BUF_IOCTL_SYNC
  dma-buf: Add an API for exporting sync files (v11)
  RFC: dma-buf: Add an extra fence to dma_resv_get_singleton_unlocked
  RFC: dma-buf: Add an API for importing sync files (v7)

 Documentation/driver-api/dma-buf.rst  |   8 +
 drivers/dma-buf/dma-buf.c | 107 +-
 drivers/dma-buf/dma-fence-array.c |  27 
 drivers/dma-buf/dma-resv.c| 139 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_display.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c   |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c   |   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c|   4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c   |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c|  14 +-
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |   6 +-
 drivers/gpu/drm/drm_gem.c |  10 +-
 drivers/gpu/drm/drm_gem_atomic_helper.c   |   2 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem.c |   7 +-
 drivers/gpu/drm/etnaviv/etnaviv_gem_submit.c  |   8 +-
 drivers/gpu/drm/i915/display/intel_display.c  |   2 +-
 drivers/gpu/drm/i915/dma_resv_utils.c |   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_busy.c  |   2 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h|   2 +-
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c   |   4 +-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c  |  10 +-
 drivers/gpu/drm/i915/i915_request.c   |   6 +-
 drivers/gpu/drm/i915/i915_sw_fence.c  |   4 +-
 drivers/gpu/drm/msm/msm_gem.c |   3 +-
 drivers/gpu/drm/nouveau/dispnv50/wndw.c   |   2 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c |   4 +-
 drivers/gpu/drm/panfrost/panfrost_drv.c   |   4 +-
 drivers/gpu/drm/panfrost/panfrost_job.c   |   2 +-
 

Re: [Mesa-dev] [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules

2021-05-25 Thread Daniel Vetter
On Tue, May 25, 2021 at 5:05 PM Christian König
 wrote:
>
> Hi Daniel,
>
> Am 25.05.21 um 15:05 schrieb Daniel Vetter:
> > Hi Christian,
> >
> > On Sat, May 22, 2021 at 10:30:19AM +0200, Christian König wrote:
> >> Am 21.05.21 um 20:31 schrieb Daniel Vetter:
> >> This works by adding the fence of the last eviction DMA operation to BOs
> >> when their backing store is newly allocated. That's what the
> >> ttm_bo_add_move_fence() function you stumbled over is good for: 
> >> https://elixir.bootlin.com/linux/v5.13-rc2/source/drivers/gpu/drm/ttm/ttm_bo.c#L692
> >>
> >> Now the problem is it is possible that the application is terminated before
> >> it can complete it's command submission. But since resource management only
> >> waits for the shared fences when there are some there is a chance that we
> >> free up memory while it is still in use.
> > Hm where is this code? Would help in my audit that I wanted to do this
> > week? If you look at most other places like
> > drm_gem_fence_array_add_implicit() I mentioned earlier, then we don't
> > treat the shared fences special and always also include the exclusive one.
>
> See amdgpu_gem_object_close():
>
> ...
>  fence = dma_resv_get_excl(bo->tbo.base.resv);
>  if (fence) {
>  amdgpu_bo_fence(bo, fence, true);
>  fence = NULL;
>  }
> ...
>
> We explicitly added that because resource management of some other
> driver was going totally bananas without that.
>
> But I'm not sure which one that was. Maybe dig a bit in the git and
> mailing history of that.

Hm I looked and it's

commit 82c416b13cb7d22b96ec0888b296a48dff8a09eb
Author: Christian König 
Date:   Thu Mar 12 12:03:34 2020 +0100

   drm/amdgpu: fix and cleanup amdgpu_gem_object_close v4

That sounded more like amdgpu itself needing this, not another driver?

But looking at amdgpu_vm_bo_update_mapping() it seems to pick the
right fencing mode for gpu pte clearing, so I'm really not sure what
the bug was that you worked around here?The implementation boils down
to amdgpu_sync_resv() which syncs for the exclusive fence, always. And
there's nothing else that I could find in public history at least, no
references to bug reports or anything. I think you need to dig
internally, because as-is I'm not seeing the problem here.

Or am I missing something here?
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules

2021-05-25 Thread Christian König

Hi Daniel,

Am 25.05.21 um 15:05 schrieb Daniel Vetter:

Hi Christian,

On Sat, May 22, 2021 at 10:30:19AM +0200, Christian König wrote:

Am 21.05.21 um 20:31 schrieb Daniel Vetter:

[SNIP]

We could provide an IOCTL for the BO to change the flag.

That's not the semantics we need.


But could we first figure out the semantics we want to use here?

Cause I'm pretty sure we don't actually need those changes at all and as
said before I'm certainly NAKing things which break existing use cases.

Please read how other drivers do this and at least _try_ to understand
it. I'm really loosing my patience here with you NAKing patches you're
not even understanding (or did you actually read and fully understand
the entire story I typed up here, and your NAK is on the entire
thing?). There's not much useful conversation to be had with that
approach. And with drivers I mean kernel + userspace here.

Well to be honest I did fully read that, but I was just to emotionally
attached to answer more appropriately in that moment.

And I'm sorry that I react emotional on that, but it is really frustrating
that I'm not able to convince you that we have a major problem which affects
all drivers and not just amdgpu.

Regarding the reason why I'm NAKing this particular patch, you are breaking
existing uAPI for RADV with that. And as a maintainer of the driver I have
simply no other choice than saying halt, stop we can't do it like this.

I'm perfectly aware that I've some holes in the understanding of how ANV or
other Vulkan/OpenGL stacks work. But you should probably also admit that you
have some holes how amdgpu works or otherwise I can't imagine why you
suggest a patch which simply breaks RADV.

I mean we are working together for years now and I think you know me pretty
well, do you really think I scream bloody hell we can't do this without a
good reason?

So let's stop throwing halve backed solutions at each other and discuss what
we can do to solve the different problems we are both seeing here.

Well this was meant to be a goal post/semantics discussion starter. Yes
the patch breaks performance (but not correctness) for amdgpu, but it also
contains my suggestion for how to fix that issue (in text form at least).


Well as far as I can see this really breaks uAPI, we have unit tests 
exercising this.


But see more on this below.


Plus a plan how to roll it out so that anyone who cares doesn't hit the
perf issues this patch can cause.

Also the overall series is really meant as a subsystem wide assessment of
the status quo. Aside from a correctness issue Lucas spotted in my panfrost
patches no substantial input from others on this yet unfortunately. I need
to poke more people I think.

Anyway since the plan as a text didn't stick I'm typing up now something
more detailed in form of amdgpu patches. Maybe Bas can do the radv
conversion too.

It won't be complete by far either (I'm not working for amd after all
...), I'll leave out the opengl/media side entirely. But if this works for
radv is should be a useful blueprint for gl/media too (with some changes
in the interfaces, but not really the exposed semantics).


Yeah, but to make my point clear once more: I can't allow any patch in 
which would change amdgpus existing uAPI behavior.


What we can talk about is changing the behavior by adding flags to the 
fpriv to change the behavior and/or stuff the CS fence by default into 
the exclusive slot.


For the later I think we could do something like using a dma_fence_chain 
for the exclusive fence in amdgpu. This way we would have the same 
semantics in the CS and still support the epoll and Jasons new import IOCTL.


But the semantics of the amdgpu CS interface to not serialize from the 
same process and always serialize if you see some different process must 
stay the same or otherwise I have quite a bunch of angry end users.



That's the other frustration part: You're trying to fix this purely in
the kernel. This is exactly one of these issues why we require open
source userspace, so that we can fix the issues correctly across the
entire stack. And meanwhile you're steadfastily refusing to even look
at that the userspace side of the picture.

Well I do fully understand the userspace side of the picture for the AMD
stack. I just don't think we should give userspace that much control over
the fences in the dma_resv object without untangling them from resource
management.

And RADV is exercising exclusive sync for amdgpu already. You can do
submission to both the GFX, Compute and SDMA queues in Vulkan and those
currently won't over-synchronize.

When you then send a texture generated by multiple engines to the Compositor
the kernel will correctly inserts waits for all submissions of the other
process.

So this already works for RADV and completely without the IOCTL Jason
proposed. IIRC we also have unit tests which exercised that feature for the
video decoding use case long before RADV even existed.

Yeah multiple engines on 

[Mesa-dev] Libdrm-nouveau2

2021-05-25 Thread Erik Romero
Hello,
I am running the installation of the last version of MESA on a raspberry 4
in order to update the existing  openGL string version  2.1 Mesa 19.3.2. I
am building an applicatio nfor Kinnect using the libfreenect2.

When I run the meson .. command, I got this error,

Message: libdrm 2.4.102 needed because nouveau has the highest requirement
Run-time dependency libdrm_nouveau found: NO (tried pkgconfig and cmake)

../meson.build:1504:4: ERROR: Dependency "libdrm_nouveau" not found, tried
pkgconfig and cmake

and there is already a libdrm-nouveau2 driver, but I do not know how to
handlethis issue, if I to comme back to libdrm 2.4, or configure the meson
building to use the libdrm2

Thank you very much.

-- 
Erik ROMERO


Libre
de virus. www.avast.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules

2021-05-25 Thread Daniel Vetter
Hi Christian,

On Sat, May 22, 2021 at 10:30:19AM +0200, Christian König wrote:
> Am 21.05.21 um 20:31 schrieb Daniel Vetter:
> > [SNIP]
> > > We could provide an IOCTL for the BO to change the flag.
> > That's not the semantics we need.
> > 
> > > But could we first figure out the semantics we want to use here?
> > > 
> > > Cause I'm pretty sure we don't actually need those changes at all and as
> > > said before I'm certainly NAKing things which break existing use cases.
> > Please read how other drivers do this and at least _try_ to understand
> > it. I'm really loosing my patience here with you NAKing patches you're
> > not even understanding (or did you actually read and fully understand
> > the entire story I typed up here, and your NAK is on the entire
> > thing?). There's not much useful conversation to be had with that
> > approach. And with drivers I mean kernel + userspace here.
> 
> Well to be honest I did fully read that, but I was just to emotionally
> attached to answer more appropriately in that moment.
> 
> And I'm sorry that I react emotional on that, but it is really frustrating
> that I'm not able to convince you that we have a major problem which affects
> all drivers and not just amdgpu.
> 
> Regarding the reason why I'm NAKing this particular patch, you are breaking
> existing uAPI for RADV with that. And as a maintainer of the driver I have
> simply no other choice than saying halt, stop we can't do it like this.
> 
> I'm perfectly aware that I've some holes in the understanding of how ANV or
> other Vulkan/OpenGL stacks work. But you should probably also admit that you
> have some holes how amdgpu works or otherwise I can't imagine why you
> suggest a patch which simply breaks RADV.
> 
> I mean we are working together for years now and I think you know me pretty
> well, do you really think I scream bloody hell we can't do this without a
> good reason?
> 
> So let's stop throwing halve backed solutions at each other and discuss what
> we can do to solve the different problems we are both seeing here.

Well this was meant to be a goal post/semantics discussion starter. Yes
the patch breaks performance (but not correctness) for amdgpu, but it also
contains my suggestion for how to fix that issue (in text form at least).

Plus a plan how to roll it out so that anyone who cares doesn't hit the
perf issues this patch can cause.

Also the overall series is really meant as a subsystem wide assessment of
the status quo. Aside from a correctness issue Lucas spotted in my panfrost
patches no substantial input from others on this yet unfortunately. I need
to poke more people I think.

Anyway since the plan as a text didn't stick I'm typing up now something
more detailed in form of amdgpu patches. Maybe Bas can do the radv
conversion too.

It won't be complete by far either (I'm not working for amd after all
...), I'll leave out the opengl/media side entirely. But if this works for
radv is should be a useful blueprint for gl/media too (with some changes
in the interfaces, but not really the exposed semantics).

> > That's the other frustration part: You're trying to fix this purely in
> > the kernel. This is exactly one of these issues why we require open
> > source userspace, so that we can fix the issues correctly across the
> > entire stack. And meanwhile you're steadfastily refusing to even look
> > at that the userspace side of the picture.
> 
> Well I do fully understand the userspace side of the picture for the AMD
> stack. I just don't think we should give userspace that much control over
> the fences in the dma_resv object without untangling them from resource
> management.
> 
> And RADV is exercising exclusive sync for amdgpu already. You can do
> submission to both the GFX, Compute and SDMA queues in Vulkan and those
> currently won't over-synchronize.
> 
> When you then send a texture generated by multiple engines to the Compositor
> the kernel will correctly inserts waits for all submissions of the other
> process.
> 
> So this already works for RADV and completely without the IOCTL Jason
> proposed. IIRC we also have unit tests which exercised that feature for the
> video decoding use case long before RADV even existed.

Yeah multiple engines on the producer side works fine with the current
scheme you have (if we ignore for now that the way amdgpu uses dma_resv is
different from other drivers by not setting the exclusive slots for
producers).

Where it breaks down is you have overlapping reads once a frame is
generated, on either side. E.g.

- compositors read the buffer as consumer
- but also producer reads the buffer again (maybe reference frame for
  media, or maybe for some post-processing like motion blurr or whatever).

Then your current scheme really badly oversyncs, while other drivers
don't have this issue. Fixing this for amdgpu will have quite a big impact
on how dma_resv will be used by amdgpu, and that's why I think before
we've looked at this it doesn't make sense