FWIW, the NVIDIA binary driver's implementation of gbm_bo_map/unmap()

1) Don't do any synchronization against in-flight work. The assumption is that if the content is going to be read, the API writing the data has established that coherence. Likewise, if it's going to be written, the API reading it afterwards does any invalidates or whatever are needed for coherence.

2) We don't blit anything or format convert, because our GBM implementation has no DMA engine access, and I'd like to keep it that way. Setting up a DMA-capable driver instance is much more expensive as far as runtime resources than setting up a simple allocator+mmap driver, at least in our driver architecture. Our GBM map just does an mmap(), and if it's not linear, you're not going to be able to interpret the data unless you've read up on our tiling formats. I'm aware this is different from Mesa, and no one has complained thus far. If we were forced to fix it, I imagine we'd do something like ask a shared engine in the kernel to do the blit on userspace's behalf, which would probably be slow but save resources.

Basically, don't use gbm_bo_map() for anything non-trivial on our implementation. It's not the right tool for e.g., reading back or populating OpenGL textures or X pixmaps. If you don't want to run on the NV implementation, feel free to ignore this advice, but I'd still suggest it's not the best tool for most jobs.

Thanks,
-James

On 6/17/24 03:29, Pierre Ossman wrote:
On 17/06/2024 10:13, Christian König wrote:

Let me try to clarify a couple of things:

The DMA_BUF_IOCTL_SYNC function is to flush and invalidate caches so that the GPU can see values written by the CPU and the CPU can see values written by the GPU. But that IOCTL does *not* wait for any async GPU operation to finish.

If you want to wait for async GPU operations you either need to call the OpenGL functions to read pixels or do a select() (or poll, epoll etc...) call on the DMA-buf file descriptor.


Thanks for the clarification!

Just to avoid any uncertainty, are both of these things done implicitly by gbm_bo_map()/gbm_bo_unmap()?

I did test adding those steps just in case, but unfortunately did not see an improvement. My order was:

1. gbm_bo_import(GBM_BO_USE_RENDERING)
2. gbm_bo_get_fd()
3. Wait for client to request displaying the buffer
4. gbm_bo_map(GBM_BO_TRANSFER_READ)
5. select(fd+1, &fds, NULL, NULL, NULL)
6. ioctl(DMA_BUF_IOCTL_SYNC, &{ .flags = DMA_BUF_SYNC_START | DMA_BUF_SYNC_READ })
7. pixman_blt()
8. gbm_bo_unmap()

So if you want to do some rendering with OpenGL and then see the result in a buffer memory mapping the correct sequence would be the following:

1. Issue OpenGL rendering commands.
2. Call glFlush() to make sure the hw actually starts working on the rendering. 3. Call select() on the DMA-buf file descriptor to wait for the rendering to complete.
4. Use DMA_BUF_IOCTL_SYNC to make the rendering result CPU visible.


What I want to do is implement the X server side of DRI3 in just CPU. It works for every application I've tested except gnome-shell.

I would assume that 1. and 2. are supposed to be done by the X client, i.e. gnome-shell?

What I need to be able to do is access the result of that, once the X client tries to draw using that GBM backed pixmap (e.g. using PresentPixmap).

So far, we've only tested Intel GPUs, but we are setting up Nvidia and AMD GPUs at the moment. It will be interesting to see if the issue remains on those or not.

Regards

Reply via email to