Avi Kivity wrote: > Han, Weidong wrote: >>> If we devolve this to the iommu API, the same io page table can be >>> shared by all iommus, so long as they all use the same page table >>> format. >>> >> >> I don't understand how to handle this by iommu API. Let me explain >> my thoughts more clearly: >> >> VT-d spec says: >> Context-entries programmed with the same domain identifier must >> always reference the same address translation structure (through the >> ASR field). Similarly, context-entries referencing the same address >> translation structure must be programmed with the same domain id. >> >> In native VT-d driver, dmar_domain is per device, and has its own >> VT-d page table, which is dynamically setup before each DMA. So it is >> impossible that the same VT-d page table is shared by all iommus. >> Moveover different iommus in system may have different page table >> levels. > > Right. This use case is in essence to prevent unintended sharing. It > is also likely to have low page table height, since dma sizes are > relatively small. > >> I think it's enough that iommu API tells us its iommu of a >> device. >> > > While this is tangential to our conversation, why? Even for the > device driver use case, this only makes the API more complex. If the > API hides the existence of multiple iommus, it's easier to use and > harder to make a mistake. > >> Whereas in KVM side, the same VT-d page table can be shared by the >> devices which are under smae iommu and assigned to the same guest, >> because all of the guest's memory are statically mapped in VT-d page >> table. But it needs to wrap dmar_domain, this patch wraps it with a >> reference count for multiple devices relate to same dmar_domain. >> >> This patch already adds an API (intel_iommu_device_get_iommu()) in >> intel-iommu.c, which returns its iommu of a device. > > There is a missed optimization here. Suppose we have two devices each > under a different iommu. With the patch, each will be in a different > dmar_domain and so will have a different page table. The amount of > memory used is doubled.
You cannot let two devices each under a different iommu share one dmar_domain, becasue dmar_domain has a pointer to iommu. > > Suppose the iommu API hides the existence of multiple iommus. You > allocate a translation and add devices to it. When you add a device, > the iommu API checks which iommu is needed and programs it > accordingly, but only one io page table is used. > > The other benefit is that iommu developers understand this issues > while kvm developers don't, so it's best managed by the iommu API. > This way if things change (as usual, becoming more complicated), the > iommu can make the changes in their code and hide the complexity from > kvm or other users. > > I'm probably (badly) duplicating Joerg's iommu API here, but this is > how it could go: > > iommu_translation_create() - creates an iommu translation object; this > allocates the page tables but doesn't do anything with them > iommu_translation_map() - adds pages to the translation > iommu_translation_attach() - attach a device to the translation; this > locates the iommu and programs it > _detach(), _unmap(), and _free() undo these operations. In fact, the exported APIs added for KVM VT-d also do create/map/attach/detach/free functions. Whereas these iommu APIs are more readable. Because kvm VT-d usage is different with native usage, it's inevitable extend native VT-d code to support KVM VT-d (such as wrap dmar_domain). For devices under different iommus, they cannot share the same dmar_domain, thus they cannot share VT-d page table. If we want to handle this by iommu APIs, I suspect we need to change lots of native VT-d driver code. David/Jesse, what's your opinion? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html