Re: [PATCH 2/2] add initial kvm dev passhtrough support

2013-06-12 Thread Mario Smarduch
Resending, initial email from my exchange client got rejected
due to HTML content

On 6/12/2013 8:45 AM, Mario Smarduch wrote:
  
 
Hi Antonios, 
 thanks for your feedback, initially we’ll work with static binding
 gain performance data given latency/throughput is key, later add dynamic
 binding (as well as re-optimize affinity code). And as you already
 know move towards VFIO, which is a longer term effort.
 
 
 +struct kvm_arm_assigned_dev_kernel {
 +   struct list_head list;
 +   struct kvm_arm_assigned_device dev;
 +   irqreturn_t (*irq_handler)(int, void *);
 +   void *irq_arg;
 +};
 +
 
  
 
 Instead of irq_arg, isn't something such as target_vcpu more clear?
 
  
 
MS Agree.
 
  
 
 diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
 index 17c5ac7..f4cb804 100644
 --- a/arch/arm/kvm/vgic.c
 +++ b/arch/arm/kvm/vgic.c
 @@ -449,6 +449,41 @@ static u32 vgic_get_target_reg(struct kvm *kvm, int 
 irq)
 return val;
  }
 
 +/* Follow the IRQ vCPU affinity so passthrough device interrupts are 
 injected
 + * on physical CPU they execute.
 + */
 +static void vgic_set_passthru_affinity(struct kvm *kvm, int irq, u32 
 target)
 +{
 +   struct list_head *dev_list_ptr = kvm-arch.assigned_dev_head;
 +   struct list_head *ptr;
 +   struct kvm_arm_assigned_dev_kernel *assigned_dev;
 +   struct vgic_dist *dist = kvm-arch.vgic;
 +   char *buf;
 +   int cpu, hwirq;
 +
 +   mutex_lock(kvm-arch.dev_pasthru_lock);
 +   list_for_each(ptr, dev_list_ptr) {
 +   assigned_dev = list_entry(ptr,
 +   struct kvm_arm_assigned_dev_kernel, list);
 +   if (assigned_dev-dev.guest_res.girq == irq) {
 +   if (assigned_dev-irq_arg)
 +   free_irq(irq, assigned_dev-irq_arg);
 +   cpu = kvm-vcpus[target]-cpu;
 +   hwirq = assigned_dev-dev.dev_res.hostirq.hwirq;
 +   irq_set_affinity(hwirq, cpumask_of(cpu));
 +   assigned_dev-irq_arg = kvm-vcpus[target];
 +   buf = assigned_dev-dev.dev_res.hostirq.host_name;
 +   sprintf(buf, %s-KVM Pass-through,
 +   
 assigned_dev-dev.dev_res.devname);
 +   gic_spi_set_priodrop(hwirq);
 +   dist-guest_irq[hwirq - VGIC_NR_PRIVATE_IRQS] = 
 irq;
 +   request_irq(hwirq, assigned_dev-irq_handler, 0, 
 buf,
 +   
 assigned_dev-irq_arg);
 +   }
 +   }
 +   mutex_unlock(kvm-arch.dev_pasthru_lock);
 +}
 +
 
  
 
 Maybe vgic_set_pasthru_affinity is not an ideal name for the function, since 
 you do more than that here.
 
 After looking at your code I think things will be much easier if you decouple 
 the host irq affinity bits from here. After that there is not much stopping 
 from affinity following the CPU a vCPU will execute.
 
 I would rename this to something to reflect that you enable priodrop for this 
 IRQ here, for example only vgic_set_passthrough could suffice (I'm don't like 
 the pasthru abbreviation a lot). Then the affinity bits can be put in a 
 different function.
 
  
 
MJS Agree naming could be better.
 
 
 
 In arch/arm/kvm/arm.c kvm_arch_vcpu_load() you can follow up whenever a vcpu 
 is moved to a different cpu. However in practice I don't know if the 
 additional complexity of having the irq affinity follow the vcpu 
 significantly improves irq latency.
 
  
 
MJS  This should save a costly IPI if for example Phys IRQ is taken on CPU 0
and target vCPU on CPU 1. I agree kvm_arch_vcpu_load() is a good place if you 
let vCPUs float. vigic_set_passthrough_affinity can be optimized more to 
eliminate 
the free_irq(), requesnt_irq(). For now it’s a simple implementation we’re
assuming static binding, start gathering performance/latency data. 
Will change the name as you suggest.
 
 
 
 
 -- 
 
 *Antonios Motakis*, Virtual Open Systems*
 */Open Source KVM Virtualization Development
 /www.virtualopensystems.com http://www.virtualopensystems.com
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Mario Smarduch

This is the initial device pass through support.
At this time host == guest only is supported.
Basic Operation:

- QEMU parameters: -device kvm-device-assign,host=device name
  for example - kvm-device-assign,host='arm-sp804'. Essentially
  any device that does PIO should be supported.
- Host DTS contains the node for device to be passed through
  The host driver is unbound or not compiled in.
- For Guest the intent is to add a DTS node that QEMU can
  parse and find the guest attributes (Mem. resource, IRQs)
  For now these values default to host. This is a future
  work item to get this working on board other then vexpress.
- The physical interrupt is always passed through to CPU
  where the target vCPU executes or will execute.
  Current approach - pins vCPUs to physical CPUs, when 
  Guest updates CPU affinity is updated in KVM vgic dist
  code. Future work item for IRQ affinity allow vCPU to
  float and on schedule in handle IRQ affinity. For high
  IRQ rates (i.e. wireless NEs) static binding may be used. 
  For some other device (env. mgmt IPMI)where latency is not
  important dynamic may be used, it should be upto the user.
- To support flexible affinity a mask is introduced (QEMU param0
  (although not used here yet)
  o vCPU affinity - vCPU -- CPU binding, the IRQ physical
CPU binding follows vCPU binding dynamically.
- Obviously DMA is not supported
  - early DMA may be supported through a 1:1 mapping but it's unsafe
and so far we don't know of any hardware that's not behind SMMU.
This option may be useful in some embedded/wireless environments,
where the guest may want to swap, secure isolation may not be
an issue or device like look aside crypto engine is not behind IOMMU.
  - IOMMU/VFIO support is key and next item for us to work on. Especially 
for ETSI NFV VFIO is key since 4G/IMS NE pull packets
of wire and switch them directly in user space.

The patch has been tested on fast models in couple ways:
- UP Guest with sp804 timer only - works consistently
- SMP Guest with sp804 timer works consistently. 
  Writes to '/proc/irq/sp804 irq/smp_affinity' 
  confirm dynamic CPU affinity.
- IRQ rates (maybe not that important give its emulated env) reached
  excess of 500.

There is a QEMU piece very simple for now that I will
email later, in case someone would like to test.

- Mario



Signed-off-by: Mario Smarduch mario.smard...@huawei.com
---
 arch/arm/include/asm/kvm_host.h |   14 +++
 arch/arm/include/asm/kvm_vgic.h |   10 +++
 arch/arm/kvm/Makefile   |1 +
 arch/arm/kvm/arm.c  |   60 +
 arch/arm/kvm/assign-dev.c   |  189 +++
 arch/arm/kvm/vgic.c |  106 ++
 include/linux/irqchip/arm-gic.h |1 +
 include/uapi/linux/kvm.h|   33 +++
 8 files changed, 414 insertions(+)
 create mode 100644 arch/arm/kvm/assign-dev.c

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 57cb786..c6ad3a3 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -67,6 +67,10 @@ struct kvm_arch {
 
/* Interrupt controller */
struct vgic_distvgic;
+
+   /* Device Passthrough Fields */
+   struct list_headassigned_dev_head;
+   struct mutexdev_pasthru_lock;
 };
 
 #define KVM_NR_MEM_OBJS 40
@@ -146,6 +150,13 @@ struct kvm_vcpu_stat {
u32 halt_wakeup;
 };
 
+struct kvm_arm_assigned_dev_kernel {
+   struct list_head list;
+   struct kvm_arm_assigned_device dev;
+   irqreturn_t (*irq_handler)(int, void *);
+   void *irq_arg;
+};
+
 struct kvm_vcpu_init;
 int kvm_vcpu_set_target(struct kvm_vcpu *vcpu,
const struct kvm_vcpu_init *init);
@@ -156,6 +167,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct 
kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 u64 kvm_call_hyp(void *hypfn, ...);
 void force_vm_exit(const cpumask_t *mask);
+int kvm_arm_get_device_resources(struct kvm *,
+   struct kvm_arm_get_device_resources *);
+int kvm_arm_assign_device(struct kvm *, struct kvm_arm_assigned_device *);
 
 #define KVM_ARCH_WANT_MMU_NOTIFIER
 struct kvm;
diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index 343744e..c4370ae 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -107,6 +107,16 @@ struct vgic_dist {
 
/* Bitmap indicating which CPU has something pending */
unsigned long   irq_pending_on_cpu;
+
+   /* Device passthrough  fields */
+   /* Host irq to guest irq mapping */
+   u8  guest_irq[VGIC_NR_SHARED_IRQS];
+
+   /* Pending passthruogh irq */
+   struct vgic_bitmap  pasthru_spi_pending;
+
+   /* At least one passthrough IRQ pending for some vCPU */
+   u32 pasthru_pending;
 #endif
 };
 
diff 

Re: [PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Alexander Graf

Am 11.06.2013 um 09:43 schrieb Mario Smarduch mario.smard...@huawei.com:

 
 This is the initial device pass through support.
 At this time host == guest only is supported.
 Basic Operation:
 
 - QEMU parameters: -device kvm-device-assign,host=device name
  for example - kvm-device-assign,host='arm-sp804'. Essentially
  any device that does PIO should be supported.

Yikes!

Over the last few years we've worked very hard to get rid of the unfortunate 
intertwining of device assignment and KVM. There are a number of reasons it's a 
bad idea:

  - kvm access is a potential priviledge escalation
  - device assignment is limited to kvm

The solution to both of the above is VFIO. You get a completely separate 
interface for accessing your devices with a few connecting bits (irqfd, 
eventfd) to communicate quickly between vfio and kvm.

Is there any particular reason you're not going down that path for your ARM 
implementation?

On the embedded PPC side we've been discussing vfio and how it fits into a 
device tree, non-PCI world for a while. If you like, we can dive into more 
detail on that, either via email or via phone.


Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Mario Smarduch

On 6/11/2013 10:28 AM, Alexander Graf wrote:

 
 Is there any particular reason you're not going down that path for your ARM 
 implementation?

We see this as a good starting point to build on, we need baseline numbers
for performance, latency, interrupt throughput on real hardware
ASAP to build competency for NFV, which has demanding Dev. Passthrough
requirements. Over time we plan contributing to SMMU and VFIO as well
(we're looking into this now).

FYI NFV is an initiative wireless/fixed network operators are working 
towards - to virtualize Core, likely Radia Access and even Home Network 
equipment, this is a epic undertaking (i.e. Network Function Virtualization). 
So far VMware has taken the lead (mostly x86).
 
 
 On the embedded PPC side we've been discussing vfio and how it fits into a 
 device tree, non-PCI world for a while. If you like, we can dive into more 
 detail on that, either via email or via phone.

I'll email you offline, I'd like to know more what you've done on this
and see where we can align/leverage the effort.

- Mario
 
 
 Alex
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Alex Williamson
On Tue, 2013-06-11 at 16:13 +0200, Mario Smarduch wrote:
 On 6/11/2013 10:28 AM, Alexander Graf wrote:
 
  
  Is there any particular reason you're not going down that path for your ARM 
  implementation?
 
 We see this as a good starting point to build on, we need baseline numbers
 for performance, latency, interrupt throughput on real hardware
 ASAP to build competency for NFV, which has demanding Dev. Passthrough
 requirements. Over time we plan contributing to SMMU and VFIO as well
 (we're looking into this now).
 
 FYI NFV is an initiative wireless/fixed network operators are working 
 towards - to virtualize Core, likely Radia Access and even Home Network 
 equipment, this is a epic undertaking (i.e. Network Function Virtualization). 
 So far VMware has taken the lead (mostly x86).
  
  
  On the embedded PPC side we've been discussing vfio and how it fits into a 
  device tree, non-PCI world for a while. If you like, we can dive into more 
  detail on that, either via email or via phone.
 
 I'll email you offline, I'd like to know more what you've done on this
 and see where we can align/leverage the effort.

Yes, please let's use VFIO rather than continue to use or invent new
device assignment interfaces for KVM.  Antonios Motakis (cc'd) already
contacted me about VFIO for ARM.  IIRC, his initial impression was that
the IOMMU backend was almost entirely reusable for ARM (a couple PCI
assumptions implicit in the IOMMU API to handle) and my hope was that
ARM and PPC could work together on a common VFIO device tree backend.
Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] add initial kvm dev passhtrough support

2013-06-11 Thread Mario Smarduch

I know Antonios very well. Yes our intent is definitely to use VFIO.

- Mario 

On 6/11/2013 4:52 PM, Alex Williamson wrote:
 On Tue, 2013-06-11 at 16:13 +0200, Mario Smarduch wrote:
 On 6/11/2013 10:28 AM, Alexander Graf wrote:


 Is there any particular reason you're not going down that path for your ARM 
 implementation?

 We see this as a good starting point to build on, we need baseline numbers
 for performance, latency, interrupt throughput on real hardware
 ASAP to build competency for NFV, which has demanding Dev. Passthrough
 requirements. Over time we plan contributing to SMMU and VFIO as well
 (we're looking into this now).

 FYI NFV is an initiative wireless/fixed network operators are working 
 towards - to virtualize Core, likely Radia Access and even Home Network 
 equipment, this is a epic undertaking (i.e. Network Function 
 Virtualization). 
 So far VMware has taken the lead (mostly x86).
  

 On the embedded PPC side we've been discussing vfio and how it fits into a 
 device tree, non-PCI world for a while. If you like, we can dive into more 
 detail on that, either via email or via phone.

 I'll email you offline, I'd like to know more what you've done on this
 and see where we can align/leverage the effort.
 
 Yes, please let's use VFIO rather than continue to use or invent new
 device assignment interfaces for KVM.  Antonios Motakis (cc'd) already
 contacted me about VFIO for ARM.  IIRC, his initial impression was that
 the IOMMU backend was almost entirely reusable for ARM (a couple PCI
 assumptions implicit in the IOMMU API to handle) and my hope was that
 ARM and PPC could work together on a common VFIO device tree backend.
 Thanks,
 
 Alex
 


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html