On Wed, 2013-09-25 at 13:44 -0600, Alex Williamson wrote: > On Wed, 2013-09-25 at 19:52 +0100, David Woodhouse wrote: > > On Wed, 2013-09-25 at 11:36 -0600, Alex Williamson wrote: > > > On Wed, 2013-09-25 at 17:05 +0100, David Woodhouse wrote: > > > > Why would it ever care? If it *happens* to map something that can use > > > > large pages, yay!. If it subsequently breaks apart those large pages by > > > > unmapping 4KiB in the middle, let the IOMMU driver break that apart. > > > > > > Can this be done atomically? I thought part of the reason for this > > > interface was that iommu drivers typically couldn't replace a huge page > > > with multiple smaller pages in the presence of DMA. > > > > For the Intel IOMMU it can. You can atomically change from a large page > > entry, to a pointer to a full set of smaller page tables. Do the IOTLB > > flush, and at no time is there an interruption in service. > > Cool > > > Not sure if this is true for *all* IOMMU hardware; I'd be perfectly > > happy to accept a variant of Jörg's proposal that we should only ever > > unmap exactly the same range that we mapped. Except we should allow the > > unmapping of adjacent regions together; just not a partial unmap of > > something that was mapped in one go. > > Well, except if we've just trusted the IOMMU driver to add a device > behind a non-SP capable IOMMU to our domain and convert the page tables, > that partial unmap is no longer partial and now we get different > behavior than before so we can't depend on that adjacent unmapping.
Que? Jörg's proposal was that if you add a mapping at a given address+size, you should always remove *exactly* that address+size. Which will always work exactly the same, regardless of superpages. My slight change to that was that if you also added an *adjacent* mapping at address2+size2, you should be able to unmap both at the same time. Which will *also* always work the same regardless of superpages. Even if your two mappings were also *physically* contiguous, and *could* have used superpages, they probably won't anyway because you mapped them in two parts. > > > What happens if my IOMMU domain makes use of super pages and > > > then I add a new device behind a new IOMMU without hardware super page > > > support? > > > > Currently, you end up with the domain happily including superpages, and > > the less capable IOMMU that you added later won't cope. > > This is the trouble with trusting the iommu driver. ;) Sorry, I should have made it clearer that this is a *bug*. It's not by design. The IOMMU driver ought to get this right, and will do. > > What we probably > > *ought* to do is walk the page tables and convert any pre-existing > > superpages to small pages, at the time we add the non-SP-capable IOMMU. > > And then we need to figure out how to handle that in the proposed > interface changes above since it changes the unmap behavior to the naive > user. Isn't that what you'd *expect*? Surely you don't *expect* the breakage you currently get? > There's also the question of whether the IOMMU driver should > re-evaluate super pages when the less capable IOMMU is removed from the > domain. I wouldn't bother to go looking for opportunities to use super pages if we remove the last non-SP-capable IOMMU from the domain. > > FWIW we currently screw up the handling of cache-coherent vs. > > non-coherent page tables too. That one wants a wbinvd somewhere when we > > add a non-coherent IOMMU to the domain. > > You're not selling the "trust the IOMMU driver" story very well here. > Can we assume that the IOMMU_CACHE flag (SNP) is ignored appropriately > by non-coherent IOMMUs? Is there any downside to ignoring it and always > setting SNP in the IOMMU page tables? AMD IOMMU ignores it, but it's > also always cache coherent. Thanks, SNP is a separate issue. I'm speaking of cache coherency of the hardware page table walk — the feature bit that all the horrid clflush calls are predicated on. Again, this is just a bug. We *should* be getting this right, but don't yet. -- dwmw2
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ iommu mailing list [email protected] https://lists.linuxfoundation.org/mailman/listinfo/iommu
