On 11/25/25 14:52, Pavel Begunkov wrote: > On 11/24/25 14:17, Christian König wrote: >> On 11/24/25 12:30, Pavel Begunkov wrote: >>> On 11/24/25 10:33, Christian König wrote: >>>> On 11/23/25 23:51, Pavel Begunkov wrote: >>>>> Picking up the work on supporting dmabuf in the read/write path. >>>> >>>> IIRC that work was completely stopped because it violated core dma_fence >>>> and DMA-buf rules and after some private discussion was considered not >>>> doable in general. >>>> >>>> Or am I mixing something up here? >>> >>> The time gap is purely due to me being busy. I wasn't CC'ed to those private >>> discussions you mentioned, but the v1 feedback was to use dynamic >>> attachments >>> and avoid passing dma address arrays directly. >>> >>> https://lore.kernel.org/all/[email protected]/ >>> >>> I'm lost on what part is not doable. Can you elaborate on the core >>> dma-fence dma-buf rules? >> >> I most likely mixed that up, in other words that was a different discussion. >> >> When you use dma_fences to indicate async completion of events you need to >> be super duper careful that you only do this for in flight events, have the >> fence creation in the right order etc... > > I'm curious, what can happen if there is new IO using a > move_notify()ed mapping, but let's say it's guaranteed to complete > strictly before dma_buf_unmap_attachment() and the fence is signaled? > Is there some loss of data or corruption that can happen?
The problem is that you can't guarantee that because you run into deadlocks. As soon as a dma_fence() is created and published by calling add_fence it can be memory management loops back and depends on that fence. So you actually can't issue any new IO which might block the unmap operation. > > sg_table = map_attach() | > move_notify() | > -> add_fence(fence) | > | issue_IO(sg_table) > | // IO completed > unmap_attachment(sg_table) | > signal_fence(fence) | > >> For example once the fence is created you can't make any memory allocations >> any more, that's why we have this dance of reserving fence slots, creating >> the fence and then adding it. > > Looks I have some terminology gap here. By "memory allocations" you > don't mean kmalloc, right? I assume it's about new users of the > mapping. kmalloc() as well as get_free_page() is exactly what is meant here. You can't make any memory allocation any more after creating/publishing a dma_fence. The usually flow is the following: 1. Lock dma_resv object 2. Prepare I/O operation, make all memory allocations etc... 3. Allocate dma_fence object 4. Push I/O operation to the HW, making sure that you don't allocate memory any more. 5. Call dma_resv_add_fence(with fence allocate in #3). 6. Unlock dma_resv object If you stride from that you most likely end up in a deadlock sooner or later. Regards, Christian. >>>> Since I don't see any dma_fence implementation at all that might actually >>>> be the case. >>> >>> See Patch 5, struct blk_mq_dma_fence. It's used in the move_notify >>> callback and is signaled when all inflight IO using the current >>> mapping are complete. All new IO requests will try to recreate the >>> mapping, and hence potentially wait with dma_resv_wait_timeout(). >> >> Without looking at the code that approach sounds more or less correct to me. >> >>>> On the other hand we have direct I/O from DMA-buf working for quite a >>>> while, just not upstream and without io_uring support. >>> >>> Have any reference? >> >> There is a WIP feature in AMDs GPU driver package for ROCm. >> >> But that can't be used as general purpose DMA-buf approach, because it makes >> use of internal knowledge about how the GPU driver is using the backing >> store. > > Got it > >> BTW when you use DMA addresses from DMA-buf always keep in mind that this >> memory can be written by others at the same time, e.g. you can't do things >> like compute a CRC first, then write to backing store and finally compare >> CRC. > > Right. The direct IO path also works with user pages, so the > constraints are similar in this regard. >
