On Tue, Apr 16, 2024 at 5:55 PM Peter Xu <pet...@redhat.com> wrote: > > On Tue, Apr 16, 2024 at 03:28:41PM +0200, Jürgen Groß wrote: > > On 16.04.24 13:32, Edgar E. Iglesias wrote: > > > On Wed, Apr 10, 2024 at 8:56 PM Peter Xu <pet...@redhat.com> wrote: > > > > > > > > On Wed, Apr 10, 2024 at 06:44:38PM +0200, Edgar E. Iglesias wrote: > > > > > On Tue, Feb 27, 2024 at 11:37 PM Vikram Garhwal > > > > > <vikram.garh...@amd.com> > > > > > wrote: > > > > > > > > > > > From: Juergen Gross <jgr...@suse.com> > > > > > > > > > > > > In order to support mapping and unmapping guest memory dynamically > > > > > > to > > > > > > and from qemu during address_space_[un]map() operations add the > > > > > > map() > > > > > > and unmap() callbacks to MemoryRegionOps. > > > > > > > > > > > > Those will be used e.g. for Xen grant mappings when performing guest > > > > > > I/Os. > > > > > > > > > > > > Signed-off-by: Juergen Gross <jgr...@suse.com> > > > > > > Signed-off-by: Vikram Garhwal <vikram.garh...@amd.com> > > > > > > > > > > > > > > > > > > > > > Paolo, Peter, David, Phiippe, do you guys have any concerns with this > > > > > patch? > > > > > > > > > > Thanks for your comments Peter, > > > > > > > > > > This introduces a 3rd memory type afaict, neither direct nor !direct. > > > > > > > > What happens if someone does address_space_write() to it? I didn't see > > > > it > > > > covered here.. > > > > > > You're right, that won't work, the memory needs to be mapped before it > > > can be used. > > > At minimum there should be some graceful failure, right now this will > > > crash. > > > > > > > > > > > OTOH, the cover letter didn't mention too much either on the big > > > > picture: > > > > > > > > https://lore.kernel.org/all/20240227223501.28475-1-vikram.garh...@amd.com/ > > > > > > > > I want to have a quick grasp on whether it's justified worthwhile we > > > > should > > > > introduce this complexity to qemu memory core. > > > > > > > > Could I request a better cover letter when repost? It'll be great to > > > > mention things like: > > > > > > I'll do that, but also answer inline in the meantime since we should > > > perhaps change the approach. > > > > > > > > > > > - what is grant mapping, why it needs to be used, when it can be > > > > used (is > > > > it only relevant to vIOMMU=on)? Some more information on the high > > > > level design using this type or MR would be great. > > > > > > https://github.com/xen-project/xen/blob/master/docs/misc/grant-tables.txt > > > > > > Xen VM's that use QEMU's VirtIO have a QEMU instance running in a > > > separate VM. > > > > > > There's basically two mechanisms for QEMU's Virtio backends to access > > > the guest's RAM. > > > 1. Foreign mappings. This gives the VM running QEMU access to the > > > entire RAM of the guest VM. > > > > Additionally it requires qemu to run in dom0, while in general Xen allows > > to run backends in less privileged "driver domains", which are usually not > > allowed to perform foreign mappings. > > > > > 2. Grant mappings. This allows the guest to dynamically grant and > > > remove access to pages as needed. > > > So the VM running QEMU, cannot map guest RAM unless it's been > > > instructed to do so by the guest. > > > > > > #2 is desirable because if QEMU gets compromised it has a smaller > > > attack surface onto the guest. > > > > And it allows to run the virtio backend in a less privileged VM. > > > > > > > > > > > > > - why a 3rd memory type is required? Do we have other alternatives? > > > > > > Yes, there are alternatives. > > > > > > 1. It was suggested by Stefano to try to handle this in existing > > > qemu/hw/xen/*. > > > This would be less intrusive but perhaps also less explicit. > > > Concerns about touching the Memory API have been raised before, so > > > perhaps we should try this. > > > I'm a little unsure how we would deal with unmapping when the guest > > > removes our grants and we're using models that don't map but use > > > address_space_read/write(). > > > > Those would either need to use grant-copy hypercalls, or they'd need to map, > > read/write, unmap. > > > > > > > > 2. Another approach could be to change the Xen grant-iommu in the > > > Linux kernel to work with a grant vIOMMU in QEMU. > > > Linux could explicitly ask QEMU's grant vIOMMU to map/unmap granted > > > regions. > > > This would have the benefit that we wouldn't need to allocate > > > address-bit 63 for grants. > > > A drawback is that it may be slower since we're going to need to > > > bounce between guest/qemu a bit more. > > > > It would be a _lot_ slower, unless virtio-iommu and grants are both modified > > to match. I have looked into that, but the needed effort is rather large. At > > the last Xen summit I have suggested to introduce a new grant format which > > would work more like a normal page table structure. Using the same format > > for virtio-iommu would allow to avoid the additional bounces between qemu > > and > > the guest (and in fact that was one of the motivations to suggest the new > > grant format). > > I have a better picture now, thanks both. > > It really looks like an vIOMMU already to me, perhaps with a special refID > mapping playing similar roles as IOVAs in the rest IOMMU worlds. > > I can't yet tell what's the best way for Xen - as of now QEMU's memory API > does provide such translations via IOMMUMemoryRegionClass.translate(), but > only from that. So far it works for all vIOMMU emulations in QEMU, and I'd > hope we don't need to hack another memory type if possible for this, > especially if for performance's sake; more on below. > > QEMU also suffers from similar issues with other kind of DMA protections, > at least that's what I'm aware with using either VT-d, SMMU, etc.. where > dynamic DMA mappings will slow the IOs down to a degree that it may not be > usable in real production. We kept it like that and so far AFAIK we don't > yet have a solution for that simply because of the nature on how DMA > buffers are mapped and managed within a guest OS no matter Linux or not. > > For Linux as a guest we basically suggest enabling iommu=pt so that kernel > drivers are trusted, and kernel driven devices can have full access to > guest RAMs by using the vIOMMU's passthrough mode. Perhaps similar to > foreign mappings for Xen, but maybe still different, as Xen's topology is > definitely special as a hypervisor here. > > While for userspace drivers within the guest OS it'll always go through > vfio-pci now, which will enforce effective DMA mappings not the passthrough > mode. Then it's suggested to only map as less as possible, e.g. DPDK only > maps at the start of the user driver so it's mostly not affected by the > slowness of frequently changing DMA mappings. > > I'm not sure whether above ideas would even be applicable for Xen, but I > just to share the status quo regarding to how we manage protected DMAs when > without Xen, just in case there's anything useful to help route the path > forward. > > Thanks, > > -- > Peter Xu >
Thanks for the suggestions Peter and for your comments Jurgen. We'll have to evaluate the different approaches a little more and see where we go from here. Best regards, Edgar