On Thu, 2008-08-21 at 12:13 +0530, Amit Shah wrote:
> * On Thursday 07 Aug 2008 19:44:47 Ben-Ami Yassour wrote:
> > Based on a patch by: Kay, Allen M <[EMAIL PROTECTED]>
> >
> > This patch enables pci device assignment based on VT-d support.
> > When a device is assigned to the guest, the guest memory is pinned and
> > the mapping is updated in the VT-d IOMMU.
> >
> > Signed-off-by: Kay, Allen M <[EMAIL PROTECTED]>
> > Signed-off-by: Weidong Han <[EMAIL PROTECTED]>
> > Signed-off-by: Ben-Ami Yassour <[EMAIL PROTECTED]>
> > ---
> > arch/x86/kvm/Makefile | 3 +
> > arch/x86/kvm/vtd.c | 203
> > ++++++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/x86.c |
> > 10 ++
> > include/asm-x86/kvm_host.h | 3 +
> > include/linux/kvm_host.h | 32 +++++++
> > virt/kvm/kvm_main.c | 9 ++-
> > 6 files changed, 259 insertions(+), 1 deletions(-)
> > create mode 100644 arch/x86/kvm/vtd.c
>
> > +int kvm_iommu_map_guest(struct kvm *kvm,
> > + struct kvm_assigned_dev_kernel *assigned_dev)
> > +{
> > + struct pci_dev *pdev = NULL;
> > + int rc;
> > +
> > + if (!intel_iommu_found()) {
> > + printk(KERN_ERR "intel iommu not found\n");
> > + return -ENODEV;
> > + }
> > +
> > + printk(KERN_DEBUG "VT-d direct map: host bdf = %x:%x:%x\n",
> > + assigned_dev->host_busnr,
> > + PCI_SLOT(assigned_dev->host_devfn),
> > + PCI_FUNC(assigned_dev->host_devfn));
> > +
> > + pdev = assigned_dev->dev;
> > +
> > + if (pdev == NULL) {
> > + if (kvm->arch.intel_iommu_domain) {
> > + intel_iommu_domain_exit(kvm->arch.intel_iommu_domain);
> > + kvm->arch.intel_iommu_domain = NULL;
> > + }
> > + return -ENODEV;
> > + }
> > +
> > + kvm->arch.intel_iommu_domain = intel_iommu_domain_alloc(pdev);
>
> > +int kvm_iommu_unmap_guest(struct kvm *kvm)
> > +{
> > + struct kvm_assigned_dev_kernel *entry;
> > + struct pci_dev *pdev = NULL;
> > + struct dmar_domain *domain = kvm->arch.intel_iommu_domain;
> > +
> > + /* check if iommu exists and in use */
> > + if (!domain)
> > + return 0;
> > +
> > + list_for_each_entry(entry, &kvm->arch.assigned_dev_head, list) {
> > + printk(KERN_DEBUG "VT-d unmap: host bdf = %x:%x:%x\n",
> > + entry->host_busnr,
> > + PCI_SLOT(entry->host_devfn),
> > + PCI_FUNC(entry->host_devfn));
> > +
> > + for_each_pci_dev(pdev) {
> > + if ((pdev->bus->number == entry->host_busnr) &&
> > + (pdev->devfn == entry->host_devfn))
> > + break;
> > + }
> > +
> > + if (pdev == NULL)
> > + return -ENODEV;
> > +
> > + /* detach kvm dmar domain */
> > + intel_iommu_detach_dev(domain,
> > + pdev->bus->number, pdev->devfn);
> > + }
> > + kvm_iommu_unmap_memslots(kvm);
> > + intel_iommu_domain_exit(domain);
> > + return 0;
> > +}
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index a97157c..5cfc21a 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -35,6 +35,7 @@
> > #include <linux/module.h>
> > #include <linux/mman.h>
> > #include <linux/highmem.h>
> > +#include <linux/intel-iommu.h>
> >
> > #include <asm/uaccess.h>
> > #include <asm/msr.h>
> > @@ -271,9 +272,17 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
> >
> > list_add(&match->list, &kvm->arch.assigned_dev_head);
> >
> > + /* currenlty iommu is required for handling dma */
> > + r = kvm_iommu_map_guest(kvm, match);
> > + if (r)
> > + goto out_list_del;
> > +
>
> This will break pvdma support. Please check if intel-iommu is found and only
> then bail out of here.
>
> Even though pvdma is out-of-tree, it doesn't make sense to restrict our usage
> here. AMD IOMMU support will need this to be handled in the right way as
> well.
>
> > out:
> > mutex_unlock(&kvm->lock);
> > return r;
> > +out_list_del:
> > + list_del(&match->list);
> > + pci_release_regions(dev);
> > out_disable:
> > pci_disable_device(dev);
> > out_put:
> > @@ -4219,6 +4228,7 @@ static void kvm_free_vcpus(struct kvm *kvm)
> >
> > void kvm_arch_destroy_vm(struct kvm *kvm)
> > {
> > + kvm_iommu_unmap_guest(kvm);
>
> Likewise, check if intel-iommu is found as a first step in this function.
>
> As part of getting AMD-iommu and pvdma support in, we should have generic
> function names, but that can be done later.
>
> > diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> > index a18aaad..b703890 100644
> > --- a/include/linux/kvm_host.h
> > +++ b/include/linux/kvm_host.h
> > @@ -285,6 +285,33 @@ int kvm_cpu_has_interrupt(struct kvm_vcpu *v);
> > int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu);
> > void kvm_vcpu_kick(struct kvm_vcpu *vcpu);
> >
> > +#ifdef CONFIG_DMAR
> > +int kvm_iommu_map_pages(struct kvm *kvm, gfn_t base_gfn,
> > + unsigned long npages);
> > +int kvm_iommu_map_guest(struct kvm *kvm,
> > + struct kvm_assigned_dev_kernel *assigned_dev);
> > +int kvm_iommu_unmap_guest(struct kvm *kvm);
> > +#else /* CONFIG_DMAR */
> > +static inline int kvm_iommu_map_pages(struct kvm *kvm,
> > + gfn_t base_gfn,
> > + unsigned long npages)
> > +{
> > + return 0;
> > +}
> > +
> > +static inline int kvm_iommu_map_guest(struct kvm *kvm,
> > + struct kvm_assigned_dev_kernel
> > + *assigned_dev)
> > +{
> > + return -ENODEV;
> > +}
> > +
> > +static inline int kvm_iommu_unmap_guest(struct kvm *kvm)
> > +{
> > + return 0;
> > +}
> > +#endif /* CONFIG_DMAR */
>
> This means you can't use pvdma with iommu. Make these functions that check if
> iommu support for device assignment is enabled. A command-line parameter to
> qemu stating you want to enable vt-d for device assignment would be
> appropriate.
>
> For example, we could have the device assignment parameter changed to look
> like this:
>
> -pcidevice dev=00:13.0,dma=vtd,dma=pvdma
>
> which will mean we should use vtd in conjunction with pvdma for this assigned
> device.
Amit,
I agree with your comments that adding pvdma and AMD iommu support will
require more changes.
But I think that we should merge this code now, knowing that it will
need to change when more functionality is added in the future (isn't
that the way it usually works?).
I think we need to avoid the scenario we had two months ago with a lot
of pending code on a separate tree. It was much harder for people to
review the code and to consider merging it. If we merge this code then
the future changes will hopefully be smaller and easier to review and
merge.
Regards,
Ben
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html