On Tue, Apr 16, 2024 at 03:28:41PM +0200, Jürgen Groß wrote:
> On 16.04.24 13:32, Edgar E. Iglesias wrote:
> > On Wed, Apr 10, 2024 at 8:56 PM Peter Xu <pet...@redhat.com> wrote:
> > > 
> > > On Wed, Apr 10, 2024 at 06:44:38PM +0200, Edgar E. Iglesias wrote:
> > > > On Tue, Feb 27, 2024 at 11:37 PM Vikram Garhwal <vikram.garh...@amd.com>
> > > > wrote:
> > > > 
> > > > > From: Juergen Gross <jgr...@suse.com>
> > > > > 
> > > > > In order to support mapping and unmapping guest memory dynamically to
> > > > > and from qemu during address_space_[un]map() operations add the map()
> > > > > and unmap() callbacks to MemoryRegionOps.
> > > > > 
> > > > > Those will be used e.g. for Xen grant mappings when performing guest
> > > > > I/Os.
> > > > > 
> > > > > Signed-off-by: Juergen Gross <jgr...@suse.com>
> > > > > Signed-off-by: Vikram Garhwal <vikram.garh...@amd.com>
> > > > > 
> > > > 
> > > > 
> > > > Paolo, Peter, David, Phiippe, do you guys have any concerns with this 
> > > > patch?
> > > 
> > 
> > Thanks for your comments Peter,
> > 
> > 
> > > This introduces a 3rd memory type afaict, neither direct nor !direct.
> > > 
> > > What happens if someone does address_space_write() to it?  I didn't see it
> > > covered here..
> > 
> > You're right, that won't work, the memory needs to be mapped before it
> > can be used.
> > At minimum there should be some graceful failure, right now this will crash.
> > 
> > > 
> > > OTOH, the cover letter didn't mention too much either on the big picture:
> > > 
> > > https://lore.kernel.org/all/20240227223501.28475-1-vikram.garh...@amd.com/
> > > 
> > > I want to have a quick grasp on whether it's justified worthwhile we 
> > > should
> > > introduce this complexity to qemu memory core.
> > > 
> > > Could I request a better cover letter when repost?  It'll be great to
> > > mention things like:
> > 
> > I'll do that, but also answer inline in the meantime since we should
> > perhaps change the approach.
> > 
> > > 
> > >    - what is grant mapping, why it needs to be used, when it can be used 
> > > (is
> > >      it only relevant to vIOMMU=on)?  Some more information on the high
> > >      level design using this type or MR would be great.
> > 
> > https://github.com/xen-project/xen/blob/master/docs/misc/grant-tables.txt
> > 
> > Xen VM's that use QEMU's VirtIO have a QEMU instance running in a separate 
> > VM.
> > 
> > There's basically two mechanisms for QEMU's Virtio backends to access
> > the guest's RAM.
> > 1. Foreign mappings. This gives the VM running QEMU access to the
> > entire RAM of the guest VM.
> 
> Additionally it requires qemu to run in dom0, while in general Xen allows
> to run backends in less privileged "driver domains", which are usually not
> allowed to perform foreign mappings.
> 
> > 2. Grant mappings. This allows the guest to dynamically grant and
> > remove access to pages as needed.
> > So the VM running QEMU, cannot map guest RAM unless it's been
> > instructed to do so by the guest.
> > 
> > #2 is desirable because if QEMU gets compromised it has a smaller
> > attack surface onto the guest.
> 
> And it allows to run the virtio backend in a less privileged VM.
> 
> > 
> > > 
> > >    - why a 3rd memory type is required?  Do we have other alternatives?
> > 
> > Yes, there are alternatives.
> > 
> > 1. It was suggested by Stefano to try to handle this in existing 
> > qemu/hw/xen/*.
> > This would be less intrusive but perhaps also less explicit.
> > Concerns about touching the Memory API have been raised before, so
> > perhaps we should try this.
> > I'm a little unsure how we would deal with unmapping when the guest
> > removes our grants and we're using models that don't map but use
> > address_space_read/write().
> 
> Those would either need to use grant-copy hypercalls, or they'd need to map,
> read/write, unmap.
> 
> > 
> > 2. Another approach could be to change the Xen grant-iommu in the
> > Linux kernel to work with a grant vIOMMU in QEMU.
> > Linux could explicitly ask QEMU's grant vIOMMU to map/unmap granted regions.
> > This would have the benefit that we wouldn't need to allocate
> > address-bit 63 for grants.
> > A drawback is that it may be slower since we're going to need to
> > bounce between guest/qemu a bit more.
> 
> It would be a _lot_ slower, unless virtio-iommu and grants are both modified
> to match. I have looked into that, but the needed effort is rather large. At
> the last Xen summit I have suggested to introduce a new grant format which
> would work more like a normal page table structure. Using the same format
> for virtio-iommu would allow to avoid the additional bounces between qemu and
> the guest (and in fact that was one of the motivations to suggest the new
> grant format).

I have a better picture now, thanks both.

It really looks like an vIOMMU already to me, perhaps with a special refID
mapping playing similar roles as IOVAs in the rest IOMMU worlds.

I can't yet tell what's the best way for Xen - as of now QEMU's memory API
does provide such translations via IOMMUMemoryRegionClass.translate(), but
only from that.  So far it works for all vIOMMU emulations in QEMU, and I'd
hope we don't need to hack another memory type if possible for this,
especially if for performance's sake; more on below.

QEMU also suffers from similar issues with other kind of DMA protections,
at least that's what I'm aware with using either VT-d, SMMU, etc.. where
dynamic DMA mappings will slow the IOs down to a degree that it may not be
usable in real production.  We kept it like that and so far AFAIK we don't
yet have a solution for that simply because of the nature on how DMA
buffers are mapped and managed within a guest OS no matter Linux or not.

For Linux as a guest we basically suggest enabling iommu=pt so that kernel
drivers are trusted, and kernel driven devices can have full access to
guest RAMs by using the vIOMMU's passthrough mode. Perhaps similar to
foreign mappings for Xen, but maybe still different, as Xen's topology is
definitely special as a hypervisor here.

While for userspace drivers within the guest OS it'll always go through
vfio-pci now, which will enforce effective DMA mappings not the passthrough
mode. Then it's suggested to only map as less as possible, e.g. DPDK only
maps at the start of the user driver so it's mostly not affected by the
slowness of frequently changing DMA mappings.

I'm not sure whether above ideas would even be applicable for Xen, but I
just to share the status quo regarding to how we manage protected DMAs when
without Xen, just in case there's anything useful to help route the path
forward.

Thanks,

-- 
Peter Xu


Reply via email to