Am 24.06.24 um 21:08 schrieb James Jones:
FWIW, the NVIDIA binary driver's implementation of gbm_bo_map/unmap()
1) Don't do any synchronization against in-flight work. The assumption
is that if the content is going to be read, the API writing the data
has established that coherence. Likewise, if it's going to be written,
the API reading it afterwards does any invalidates or whatever are
needed for coherence.
That matches my assumption of what this function does, but is just the
opposite of what Michel explained what it does.
Is it somewhere documented if gbm_bo_map() should wait for in-flight
work or not?
Regards,
Christian.
2) We don't blit anything or format convert, because our GBM
implementation has no DMA engine access, and I'd like to keep it that
way. Setting up a DMA-capable driver instance is much more expensive
as far as runtime resources than setting up a simple allocator+mmap
driver, at least in our driver architecture. Our GBM map just does an
mmap(), and if it's not linear, you're not going to be able to
interpret the data unless you've read up on our tiling formats. I'm
aware this is different from Mesa, and no one has complained thus far.
If we were forced to fix it, I imagine we'd do something like ask a
shared engine in the kernel to do the blit on userspace's behalf,
which would probably be slow but save resources.
Basically, don't use gbm_bo_map() for anything non-trivial on our
implementation. It's not the right tool for e.g., reading back or
populating OpenGL textures or X pixmaps. If you don't want to run on
the NV implementation, feel free to ignore this advice, but I'd still
suggest it's not the best tool for most jobs.
Thanks,
-James
On 6/17/24 03:29, Pierre Ossman wrote:
On 17/06/2024 10:13, Christian König wrote:
Let me try to clarify a couple of things:
The DMA_BUF_IOCTL_SYNC function is to flush and invalidate caches so
that the GPU can see values written by the CPU and the CPU can see
values written by the GPU. But that IOCTL does *not* wait for any
async GPU operation to finish.
If you want to wait for async GPU operations you either need to call
the OpenGL functions to read pixels or do a select() (or poll, epoll
etc...) call on the DMA-buf file descriptor.
Thanks for the clarification!
Just to avoid any uncertainty, are both of these things done
implicitly by gbm_bo_map()/gbm_bo_unmap()?
I did test adding those steps just in case, but unfortunately did not
see an improvement. My order was:
1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, &fds, NULL, NULL, NULL)
6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START |
DMA_BUF_SYNC_READ })
7. pixman_blt()
8. gbm_bo_unmap()
So if you want to do some rendering with OpenGL and then see the
result in a buffer memory mapping the correct sequence would be the
following:
1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the
rendering.
3. Call select() on the DMA-buf file descriptor to wait for the
rendering to complete.
4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.
What I want to do is implement the X server side of DRI3 in just CPU.
It works for every application I've tested except gnome-shell.
I would assume that 1. and 2. are supposed to be done by the X
client, i.e. gnome-shell?
What I need to be able to do is access the result of that, once the X
client tries to draw using that GBM backed pixmap (e.g. using
PresentPixmap).
So far, we've only tested Intel GPUs, but we are setting up Nvidia
and AMD GPUs at the moment. It will be interesting to see if the
issue remains on those or not.
Regards