On Tue, 2019-04-09 at 15:31 +0200, [email protected] wrote: > On Tue, Apr 09, 2019 at 01:04:51PM +0000, Thomas Hellstrom wrote: > > On the VMware platform we have two possible vIOMMUS, the AMD iommu > > and > > Intel VTD, Given those conditions I belive the patch is > > functionally > > correct. We can't cover the AMD case with intel_iommu_enabled. > > Furthermore the only form of incoherency that can affect our > > graphics > > device is someone forcing SWIOTLB in which case that person would > > be > > happier with software rendering. In any case, observing the fact > > that > > the direct_ops are not used makes sure that SWIOTLB is not used. > > Knowing that we're on the VMware platform, we're coherent and can > > safely have the dma layer do dma address translation for us. All > > this > > information was not explicilty written in the changelog, no. > > We have a series pending that might bounce your buffers even when > using the Intel IOMMU, which should eventually also find its way > to other IOMMUs: > > > https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.linuxfoundation.org%2Fpipermail%2Fiommu%2F2019-March%2F034090.html&data=02%7C01%7Cthellstrom%40vmware.com%7C9933ee7b805842607ea908d6bcefc505%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C636904135345010687&sdata=Px500%2B1FjL%2FZLedUdbAXz4a%2BT5DaZBFf6wnesTyFvZY%3D&reserved=0
If that's the case, I think most of the graphics drivers will stop functioning. I don't think people would want that, and even if the graphics drivers are "to blame" due to not implementing the sync calls, I think the work involved to get things right is impressive if at all possible. > > > In any case, assuming that that patch is reverted due to the > > layering > > violation, Are you willing to help out with a small API to detect > > the > > situation where streaming DMA is incoherent? > > The short but sad answer is that we can't ever guarantee that you > can skip the dma_*sync_* calls. There are too many factors in play > that might require it at any time - working around unaligned > addresses > in iommus, CPUs that are coherent for some device and not some, > addressing > limitations both in physical CPUs and VMs (see the various "secure > VM" > concepts floating around at the moment). > > If you want to avoid the dma_*sync_* calls you must use > dma_alloc_coherent to allocate the memory. Note that the memory for > dma_alloc_coherent actually comes from the normal page pool most of > the time, and for certain on x86, which seems to be what you care > about. The times of it dipping into the tiny swiotlb pool are long > gone. So at least for you I see absolutely no reason to not simply > always use dma_alloc_coherent to start with. For other uses that > involve platforms without DMA coherent devices like arm the tradeoffs > might be a little different. There are two things that concerns me with dma_alloc_coherent: 1) It seems to want pages mapped either in the kernel map or vmapped. Graphics drivers allocate huge amounts of memory, Typically up to 50% of system memory or more. In a 32 bit PAE system I'm afraid of running out of vmap space as well as not being able to allocate as much memory as I want. Perhaps a dma_alloc_coherent() interface that returns a page rather than a virtual address would do the trick. 2) Exporting using dma-buf. A page allocated using dma_alloc_coherent() for one device might not be coherent for another device. What happens if I allocate a page using dma_alloc_coherent() for device 1 and then want to map it using dma_map_page() for device 2? Thanks, Thomas _______________________________________________ iommu mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/iommu
