On 2020-05-15 09:19, Song Bao Hua wrote:
[ snip... nice analysis, but ultimately it's still "doing stuff has more overhead than not doing stuff" ]

I am thinking several possible ways on decreasing or removing the latency of DMA 
map/unmap for every single DMA transfer. Meanwhile, "non-strict" as an existing 
option with possible safety issues, I won't discuss it in this mail.

But passthrough and non-strict mode *specifically exist* for the cases where performance is the most important concern - streaming DMA with an IOMMU in the middle has an unavoidable tradeoff between performance and isolation, so dismissing that out of hand is not a good way to start making this argument.

1. provide bounce coherent buffers for streaming buffers.
As the coherent buffers keep the status of mapping, we can remove the overhead 
of map and unmap for each single DMA operations. However, this solution 
requires memory copy between stream buffers and bounce buffers. Thus it will 
work only if copy is faster than map/unmap. Meanwhile, it will consume much 
more memory bandwidth.

I'm struggling to understand how that would work, can you explain it in more detail?

2.make upper-layer kernel components aware of the pain of iommu map/unmap
upper-layer fs, mm, networks can somehow let the lower-layer drivers know the 
end of the life cycle of sg buffers. In zswap case, I have seen zswap always 
use the same 2 pages as the destination buffers to save compressed page, but 
the compressor driver still has to constantly map and unmap those same two 
pages for every single compression since zswap and zip drivers are working in 
two completely different software layers.

I am thinking some way as below, upper-layer kernel code can call:
sg_init_table(&sg...);
sg_mark_reusable(&sg....);
.... /* use the buffer many times */
....
sg_mark_stop_reuse(&sg);

After that, if low level drivers see "reusable" flag, it will realize the buffer can be used 
multiple times and will not do map/unmap every time. it means upper-layer components will further use the 
buffers and the same buffers will probably be given to lower-layer drivers for new DMA transfer later. When 
upper-layer code sets " stop_reuse", lower-layer driver will unmap the sg buffers, possibly by 
providing a unmap-callback to upper-layer components. For zswap case, I have seen the same buffers are always 
re-used and zip driver maps and unmaps it again and again. Shortly after the buffer is unmapped, it will be 
mapped in the next transmission, almost without any time gap between unmap and map. In case zswap can set the 
"reusable" flag, zip driver will save a lot of time.
Meanwhile, for the safety of buffers, lower-layer drivers need to make certain 
the buffers have already been unmapped in iommu before those buffers go back to 
buddy for other users.

That sounds like it would only have benefit in a very small set of specific circumstances, and would be very difficult to generalise to buffers that are mapped via dma_map_page() or dma_map_single(). Furthermore, a high-level API that affects a low-level driver's interpretation of mid-layer API calls without the mid-layer's knowledge sounds like a hideous abomination of anti-design. If a mid-layer API lends itself to inefficiency at the lower level, it would seem a lot cleaner and more robust to extend *that* API for stateful buffer reuse. Failing that, it might possibly be appropriate to approach this at the driver level - many of the cleverer network drivers already implement buffer pools to recycle mapped SKBs internally, couldn't the "zip driver" simply try doing something like that for itself?

Robin.
_______________________________________________
iommu mailing list
[email protected]
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to