On Thu, Jan 16, 2014 at 01:44:20PM +0100, Christian Borntraeger wrote:
> When starting lots of dataplane devices the bootup takes very long on my
> s390 system(prototype irqfd code). With larger setups we are even able
> to
> trigger some timeouts in some components.
> Turns out that the KVM_SET_GSI_ROUTING ioctl takes very
> long (strace claims up to 0.1 sec) when having multiple CPUs.
> This is caused by the synchronize_rcu and the HZ=100 of s390.
> By changing the code to use a private srcu we can speed things up.
>
> This patch reduces the boot time till mounting root from 8 to 2
> seconds on my s390 guest with 100 disks.
>
> I converted most of the rcu routines to srcu. Review for the unconverted
> use of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
> is necessary, though. They look fine to me since they are protected by
> outer functions.
>
> In addition, we should also discuss if a global srcu (for all guests) is
> fine.
>
> Signed-off-by: Christian Borntraeger <[email protected]>
That's nice but did you try to measure the overhead
on some interrupt-intensive workloads, such as RX with 10G ethernet?
srcu locks aren't free like rcu ones.
> ---
> virt/kvm/irqchip.c | 31 +++++++++++++++++--------------
> 1 file changed, 17 insertions(+), 14 deletions(-)
>
> diff --git a/virt/kvm/irqchip.c b/virt/kvm/irqchip.c
> index 20dc9e4..5283eb8 100644
> --- a/virt/kvm/irqchip.c
> +++ b/virt/kvm/irqchip.c
> @@ -26,17 +26,20 @@
>
> #include <linux/kvm_host.h>
> #include <linux/slab.h>
> +#include <linux/srcu.h>
> #include <linux/export.h>
> #include <trace/events/kvm.h>
> #include "irq.h"
>
> +DEFINE_STATIC_SRCU(irq_srcu);
> +
> bool kvm_irq_has_notifier(struct kvm *kvm, unsigned irqchip, unsigned pin)
> {
> struct kvm_irq_ack_notifier *kian;
> - int gsi;
> + int gsi, idx;
>
> - rcu_read_lock();
> - gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
> + idx = srcu_read_lock(&irq_srcu);
> + gsi = srcu_dereference(kvm->irq_routing, &irq_srcu)->chip[irqchip][pin];
> if (gsi != -1)
> hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
> link)
> @@ -45,7 +48,7 @@ bool kvm_irq_has_notifier(struct kvm *kvm, unsigned
> irqchip, unsigned pin)
> return true;
> }
>
> - rcu_read_unlock();
> + srcu_read_unlock(&irq_srcu, idx);
>
> return false;
> }
> @@ -54,18 +57,18 @@ EXPORT_SYMBOL_GPL(kvm_irq_has_notifier);
> void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin)
> {
> struct kvm_irq_ack_notifier *kian;
> - int gsi;
> + int gsi, idx;
>
> trace_kvm_ack_irq(irqchip, pin);
>
> - rcu_read_lock();
> - gsi = rcu_dereference(kvm->irq_routing)->chip[irqchip][pin];
> + idx = srcu_read_lock(&irq_srcu);
> + gsi = srcu_dereference(kvm->irq_routing, &irq_srcu)->chip[irqchip][pin];
> if (gsi != -1)
> hlist_for_each_entry_rcu(kian, &kvm->irq_ack_notifier_list,
> link)
> if (kian->gsi == gsi)
> kian->irq_acked(kian);
> - rcu_read_unlock();
> + srcu_read_unlock(&irq_srcu, idx);
> }
>
> void kvm_register_irq_ack_notifier(struct kvm *kvm,
> @@ -85,7 +88,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
> mutex_lock(&kvm->irq_lock);
> hlist_del_init_rcu(&kian->link);
> mutex_unlock(&kvm->irq_lock);
> - synchronize_rcu();
> + synchronize_srcu_expedited(&irq_srcu);
> #ifdef __KVM_HAVE_IOAPIC
> kvm_vcpu_request_scan_ioapic(kvm);
> #endif
> @@ -115,7 +118,7 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32
> irq, int level,
> bool line_status)
> {
> struct kvm_kernel_irq_routing_entry *e, irq_set[KVM_NR_IRQCHIPS];
> - int ret = -1, i = 0;
> + int ret = -1, i = 0, idx;
> struct kvm_irq_routing_table *irq_rt;
>
> trace_kvm_set_irq(irq, level, irq_source_id);
> @@ -124,12 +127,12 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32
> irq, int level,
> * IOAPIC. So set the bit in both. The guest will ignore
> * writes to the unused one.
> */
> - rcu_read_lock();
> - irq_rt = rcu_dereference(kvm->irq_routing);
> + idx = srcu_read_lock(&irq_srcu);
> + irq_rt = srcu_dereference(kvm->irq_routing, &irq_srcu);
> if (irq < irq_rt->nr_rt_entries)
> hlist_for_each_entry(e, &irq_rt->map[irq], link)
> irq_set[i++] = *e;
> - rcu_read_unlock();
> + srcu_read_unlock(&irq_srcu, idx);
>
> while(i--) {
> int r;
> @@ -226,7 +229,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
> kvm_irq_routing_update(kvm, new);
> mutex_unlock(&kvm->irq_lock);
>
> - synchronize_rcu();
> + synchronize_srcu_expedited(&irq_srcu);
Hmm, it's a bit strange that you also do _expecited here.
What if this synchronize_rcu is replaced by synchronize_rcu_expedited
and no other changes are made?
Maybe that's enough?
>
> new = old;
> r = 0;
> --
> 1.8.4.2
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html