write via io_uring

Christian König Tue, 25 Nov 2025 06:21:28 -0800

On 11/25/25 14:52, Pavel Begunkov wrote:
> On 11/24/25 14:17, Christian König wrote:
>> On 11/24/25 12:30, Pavel Begunkov wrote:
>>> On 11/24/25 10:33, Christian König wrote:
>>>> On 11/23/25 23:51, Pavel Begunkov wrote:
>>>>> Picking up the work on supporting dmabuf in the read/write path.
>>>>
>>>> IIRC that work was completely stopped because it violated core dma_fence 
>>>> and DMA-buf rules and after some private discussion was considered not 
>>>> doable in general.
>>>>
>>>> Or am I mixing something up here?
>>>
>>> The time gap is purely due to me being busy. I wasn't CC'ed to those private
>>> discussions you mentioned, but the v1 feedback was to use dynamic 
>>> attachments
>>> and avoid passing dma address arrays directly.
>>>
>>> https://lore.kernel.org/all/[email protected]/
>>>
>>> I'm lost on what part is not doable. Can you elaborate on the core
>>> dma-fence dma-buf rules?
>>
>> I most likely mixed that up, in other words that was a different discussion.
>>
>> When you use dma_fences to indicate async completion of events you need to 
>> be super duper careful that you only do this for in flight events, have the 
>> fence creation in the right order etc...
> 
> I'm curious, what can happen if there is new IO using a
> move_notify()ed mapping, but let's say it's guaranteed to complete
> strictly before dma_buf_unmap_attachment() and the fence is signaled?
> Is there some loss of data or corruption that can happen?


The problem is that you can't guarantee that because you run into deadlocks.

As soon as a dma_fence() is created and published by calling add_fence it can 
be memory management loops back and depends on that fence.

So you actually can't issue any new IO which might block the unmap operation.

> 
> sg_table = map_attach()         |
> move_notify()                   |
>   -> add_fence(fence)           |
>                                 | issue_IO(sg_table)
>                                 | // IO completed
> unmap_attachment(sg_table)      |
> signal_fence(fence)             |
> 
>> For example once the fence is created you can't make any memory allocations 
>> any more, that's why we have this dance of reserving fence slots, creating 
>> the fence and then adding it.
> 
> Looks I have some terminology gap here. By "memory allocations" you
> don't mean kmalloc, right? I assume it's about new users of the
> mapping.

kmalloc() as well as get_free_page() is exactly what is meant here.

You can't make any memory allocation any more after creating/publishing a 
dma_fence.

The usually flow is the following:

1. Lock dma_resv object
2. Prepare I/O operation, make all memory allocations etc...
3. Allocate dma_fence object
4. Push I/O operation to the HW, making sure that you don't allocate memory any 
more.
5. Call dma_resv_add_fence(with fence allocate in #3).
6. Unlock dma_resv object

If you stride from that you most likely end up in a deadlock sooner or later.

Regards,
Christian.

>>>> Since I don't see any dma_fence implementation at all that might actually 
>>>> be the case.
>>>
>>> See Patch 5, struct blk_mq_dma_fence. It's used in the move_notify
>>> callback and is signaled when all inflight IO using the current
>>> mapping are complete. All new IO requests will try to recreate the
>>> mapping, and hence potentially wait with dma_resv_wait_timeout().
>>
>> Without looking at the code that approach sounds more or less correct to me.
>>
>>>> On the other hand we have direct I/O from DMA-buf working for quite a 
>>>> while, just not upstream and without io_uring support.
>>>
>>> Have any reference?
>>
>> There is a WIP feature in AMDs GPU driver package for ROCm.
>>
>> But that can't be used as general purpose DMA-buf approach, because it makes 
>> use of internal knowledge about how the GPU driver is using the backing 
>> store.
> 
> Got it
> 
>> BTW when you use DMA addresses from DMA-buf always keep in mind that this 
>> memory can be written by others at the same time, e.g. you can't do things 
>> like compute a CRC first, then write to backing store and finally compare 
>> CRC.
> 
> Right. The direct IO path also works with user pages, so the
> constraints are similar in this regard.
>

Re: [RFC v2 00/11] Add dmabuf read/write via io_uring

Reply via email to