Re: [PATCH 2/2] add initial kvm dev passhtrough support
Resending, initial email from my exchange client got rejected due to HTML content On 6/12/2013 8:45 AM, Mario Smarduch wrote: Hi Antonios, thanks for your feedback, initially we’ll work with static binding gain performance data given latency/throughput is key, later add dynamic binding (as well as re-optimize affinity code). And as you already know move towards VFIO, which is a longer term effort. +struct kvm_arm_assigned_dev_kernel { + struct list_head list; + struct kvm_arm_assigned_device dev; + irqreturn_t (*irq_handler)(int, void *); + void *irq_arg; +}; + Instead of irq_arg, isn't something such as target_vcpu more clear? MS Agree. diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c index 17c5ac7..f4cb804 100644 --- a/arch/arm/kvm/vgic.c +++ b/arch/arm/kvm/vgic.c @@ -449,6 +449,41 @@ static u32 vgic_get_target_reg(struct kvm *kvm, int irq) return val; } +/* Follow the IRQ vCPU affinity so passthrough device interrupts are injected + * on physical CPU they execute. + */ +static void vgic_set_passthru_affinity(struct kvm *kvm, int irq, u32 target) +{ + struct list_head *dev_list_ptr = kvm-arch.assigned_dev_head; + struct list_head *ptr; + struct kvm_arm_assigned_dev_kernel *assigned_dev; + struct vgic_dist *dist = kvm-arch.vgic; + char *buf; + int cpu, hwirq; + + mutex_lock(kvm-arch.dev_pasthru_lock); + list_for_each(ptr, dev_list_ptr) { + assigned_dev = list_entry(ptr, + struct kvm_arm_assigned_dev_kernel, list); + if (assigned_dev-dev.guest_res.girq == irq) { + if (assigned_dev-irq_arg) + free_irq(irq, assigned_dev-irq_arg); + cpu = kvm-vcpus[target]-cpu; + hwirq = assigned_dev-dev.dev_res.hostirq.hwirq; + irq_set_affinity(hwirq, cpumask_of(cpu)); + assigned_dev-irq_arg = kvm-vcpus[target]; + buf = assigned_dev-dev.dev_res.hostirq.host_name; + sprintf(buf, %s-KVM Pass-through, + assigned_dev-dev.dev_res.devname); + gic_spi_set_priodrop(hwirq); + dist-guest_irq[hwirq - VGIC_NR_PRIVATE_IRQS] = irq; + request_irq(hwirq, assigned_dev-irq_handler, 0, buf, + assigned_dev-irq_arg); + } + } + mutex_unlock(kvm-arch.dev_pasthru_lock); +} + Maybe vgic_set_pasthru_affinity is not an ideal name for the function, since you do more than that here. After looking at your code I think things will be much easier if you decouple the host irq affinity bits from here. After that there is not much stopping from affinity following the CPU a vCPU will execute. I would rename this to something to reflect that you enable priodrop for this IRQ here, for example only vgic_set_passthrough could suffice (I'm don't like the pasthru abbreviation a lot). Then the affinity bits can be put in a different function. MJS Agree naming could be better. In arch/arm/kvm/arm.c kvm_arch_vcpu_load() you can follow up whenever a vcpu is moved to a different cpu. However in practice I don't know if the additional complexity of having the irq affinity follow the vcpu significantly improves irq latency. MJS This should save a costly IPI if for example Phys IRQ is taken on CPU 0 and target vCPU on CPU 1. I agree kvm_arch_vcpu_load() is a good place if you let vCPUs float. vigic_set_passthrough_affinity can be optimized more to eliminate the free_irq(), requesnt_irq(). For now it’s a simple implementation we’re assuming static binding, start gathering performance/latency data. Will change the name as you suggest. -- *Antonios Motakis*, Virtual Open Systems* */Open Source KVM Virtualization Development /www.virtualopensystems.com http://www.virtualopensystems.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] add initial kvm dev passhtrough support
This is the initial device pass through support. At this time host == guest only is supported. Basic Operation: - QEMU parameters: -device kvm-device-assign,host=device name for example - kvm-device-assign,host='arm-sp804'. Essentially any device that does PIO should be supported. - Host DTS contains the node for device to be passed through The host driver is unbound or not compiled in. - For Guest the intent is to add a DTS node that QEMU can parse and find the guest attributes (Mem. resource, IRQs) For now these values default to host. This is a future work item to get this working on board other then vexpress. - The physical interrupt is always passed through to CPU where the target vCPU executes or will execute. Current approach - pins vCPUs to physical CPUs, when Guest updates CPU affinity is updated in KVM vgic dist code. Future work item for IRQ affinity allow vCPU to float and on schedule in handle IRQ affinity. For high IRQ rates (i.e. wireless NEs) static binding may be used. For some other device (env. mgmt IPMI)where latency is not important dynamic may be used, it should be upto the user. - To support flexible affinity a mask is introduced (QEMU param0 (although not used here yet) o vCPU affinity - vCPU -- CPU binding, the IRQ physical CPU binding follows vCPU binding dynamically. - Obviously DMA is not supported - early DMA may be supported through a 1:1 mapping but it's unsafe and so far we don't know of any hardware that's not behind SMMU. This option may be useful in some embedded/wireless environments, where the guest may want to swap, secure isolation may not be an issue or device like look aside crypto engine is not behind IOMMU. - IOMMU/VFIO support is key and next item for us to work on. Especially for ETSI NFV VFIO is key since 4G/IMS NE pull packets of wire and switch them directly in user space. The patch has been tested on fast models in couple ways: - UP Guest with sp804 timer only - works consistently - SMP Guest with sp804 timer works consistently. Writes to '/proc/irq/sp804 irq/smp_affinity' confirm dynamic CPU affinity. - IRQ rates (maybe not that important give its emulated env) reached excess of 500. There is a QEMU piece very simple for now that I will email later, in case someone would like to test. - Mario Signed-off-by: Mario Smarduch mario.smard...@huawei.com --- arch/arm/include/asm/kvm_host.h | 14 +++ arch/arm/include/asm/kvm_vgic.h | 10 +++ arch/arm/kvm/Makefile |1 + arch/arm/kvm/arm.c | 60 + arch/arm/kvm/assign-dev.c | 189 +++ arch/arm/kvm/vgic.c | 106 ++ include/linux/irqchip/arm-gic.h |1 + include/uapi/linux/kvm.h| 33 +++ 8 files changed, 414 insertions(+) create mode 100644 arch/arm/kvm/assign-dev.c diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 57cb786..c6ad3a3 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -67,6 +67,10 @@ struct kvm_arch { /* Interrupt controller */ struct vgic_distvgic; + + /* Device Passthrough Fields */ + struct list_headassigned_dev_head; + struct mutexdev_pasthru_lock; }; #define KVM_NR_MEM_OBJS 40 @@ -146,6 +150,13 @@ struct kvm_vcpu_stat { u32 halt_wakeup; }; +struct kvm_arm_assigned_dev_kernel { + struct list_head list; + struct kvm_arm_assigned_device dev; + irqreturn_t (*irq_handler)(int, void *); + void *irq_arg; +}; + struct kvm_vcpu_init; int kvm_vcpu_set_target(struct kvm_vcpu *vcpu, const struct kvm_vcpu_init *init); @@ -156,6 +167,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); u64 kvm_call_hyp(void *hypfn, ...); void force_vm_exit(const cpumask_t *mask); +int kvm_arm_get_device_resources(struct kvm *, + struct kvm_arm_get_device_resources *); +int kvm_arm_assign_device(struct kvm *, struct kvm_arm_assigned_device *); #define KVM_ARCH_WANT_MMU_NOTIFIER struct kvm; diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index 343744e..c4370ae 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -107,6 +107,16 @@ struct vgic_dist { /* Bitmap indicating which CPU has something pending */ unsigned long irq_pending_on_cpu; + + /* Device passthrough fields */ + /* Host irq to guest irq mapping */ + u8 guest_irq[VGIC_NR_SHARED_IRQS]; + + /* Pending passthruogh irq */ + struct vgic_bitmap pasthru_spi_pending; + + /* At least one passthrough IRQ pending for some vCPU */ + u32 pasthru_pending; #endif }; diff
Re: [PATCH 2/2] add initial kvm dev passhtrough support
Am 11.06.2013 um 09:43 schrieb Mario Smarduch mario.smard...@huawei.com: This is the initial device pass through support. At this time host == guest only is supported. Basic Operation: - QEMU parameters: -device kvm-device-assign,host=device name for example - kvm-device-assign,host='arm-sp804'. Essentially any device that does PIO should be supported. Yikes! Over the last few years we've worked very hard to get rid of the unfortunate intertwining of device assignment and KVM. There are a number of reasons it's a bad idea: - kvm access is a potential priviledge escalation - device assignment is limited to kvm The solution to both of the above is VFIO. You get a completely separate interface for accessing your devices with a few connecting bits (irqfd, eventfd) to communicate quickly between vfio and kvm. Is there any particular reason you're not going down that path for your ARM implementation? On the embedded PPC side we've been discussing vfio and how it fits into a device tree, non-PCI world for a while. If you like, we can dive into more detail on that, either via email or via phone. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] add initial kvm dev passhtrough support
On 6/11/2013 10:28 AM, Alexander Graf wrote: Is there any particular reason you're not going down that path for your ARM implementation? We see this as a good starting point to build on, we need baseline numbers for performance, latency, interrupt throughput on real hardware ASAP to build competency for NFV, which has demanding Dev. Passthrough requirements. Over time we plan contributing to SMMU and VFIO as well (we're looking into this now). FYI NFV is an initiative wireless/fixed network operators are working towards - to virtualize Core, likely Radia Access and even Home Network equipment, this is a epic undertaking (i.e. Network Function Virtualization). So far VMware has taken the lead (mostly x86). On the embedded PPC side we've been discussing vfio and how it fits into a device tree, non-PCI world for a while. If you like, we can dive into more detail on that, either via email or via phone. I'll email you offline, I'd like to know more what you've done on this and see where we can align/leverage the effort. - Mario Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] add initial kvm dev passhtrough support
On Tue, 2013-06-11 at 16:13 +0200, Mario Smarduch wrote: On 6/11/2013 10:28 AM, Alexander Graf wrote: Is there any particular reason you're not going down that path for your ARM implementation? We see this as a good starting point to build on, we need baseline numbers for performance, latency, interrupt throughput on real hardware ASAP to build competency for NFV, which has demanding Dev. Passthrough requirements. Over time we plan contributing to SMMU and VFIO as well (we're looking into this now). FYI NFV is an initiative wireless/fixed network operators are working towards - to virtualize Core, likely Radia Access and even Home Network equipment, this is a epic undertaking (i.e. Network Function Virtualization). So far VMware has taken the lead (mostly x86). On the embedded PPC side we've been discussing vfio and how it fits into a device tree, non-PCI world for a while. If you like, we can dive into more detail on that, either via email or via phone. I'll email you offline, I'd like to know more what you've done on this and see where we can align/leverage the effort. Yes, please let's use VFIO rather than continue to use or invent new device assignment interfaces for KVM. Antonios Motakis (cc'd) already contacted me about VFIO for ARM. IIRC, his initial impression was that the IOMMU backend was almost entirely reusable for ARM (a couple PCI assumptions implicit in the IOMMU API to handle) and my hope was that ARM and PPC could work together on a common VFIO device tree backend. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] add initial kvm dev passhtrough support
I know Antonios very well. Yes our intent is definitely to use VFIO. - Mario On 6/11/2013 4:52 PM, Alex Williamson wrote: On Tue, 2013-06-11 at 16:13 +0200, Mario Smarduch wrote: On 6/11/2013 10:28 AM, Alexander Graf wrote: Is there any particular reason you're not going down that path for your ARM implementation? We see this as a good starting point to build on, we need baseline numbers for performance, latency, interrupt throughput on real hardware ASAP to build competency for NFV, which has demanding Dev. Passthrough requirements. Over time we plan contributing to SMMU and VFIO as well (we're looking into this now). FYI NFV is an initiative wireless/fixed network operators are working towards - to virtualize Core, likely Radia Access and even Home Network equipment, this is a epic undertaking (i.e. Network Function Virtualization). So far VMware has taken the lead (mostly x86). On the embedded PPC side we've been discussing vfio and how it fits into a device tree, non-PCI world for a while. If you like, we can dive into more detail on that, either via email or via phone. I'll email you offline, I'd like to know more what you've done on this and see where we can align/leverage the effort. Yes, please let's use VFIO rather than continue to use or invent new device assignment interfaces for KVM. Antonios Motakis (cc'd) already contacted me about VFIO for ARM. IIRC, his initial impression was that the IOMMU backend was almost entirely reusable for ARM (a couple PCI assumptions implicit in the IOMMU API to handle) and my hope was that ARM and PPC could work together on a common VFIO device tree backend. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html