On 16/10/2016 05:21, Michael S. Tsirkin wrote:
> On Fri, Oct 14, 2016 at 08:21:27PM +0200, Paolo Bonzini wrote:
>> On some benchmarks (e.g. netperf with ioeventfd disabled), APICv
>> posted interrupts turn out to be slower than interrupt injection via
>> KVM_REQ_EVENT.
>>
>> This patch optimizes a bit the IRR update, avoiding expensive atomic
>> operations in the common case where PI.ON=0 at vmentry or the PIR vector
>> is mostly zero.  This saves at least 20 cycles (1%) per vmexit, as
>> measured by kvm-unit-tests' inl_from_qemu test (20 runs):
>>
>>               | enable_apicv=1  |  enable_apicv=0
>>               | mean     stdev  |  mean     stdev
>>     ----------|-----------------|------------------
>>     before    | 5826     32.65  |  5765     47.09
>>     after     | 5809     43.42  |  5777     77.02
>>
>> Of course, any change in the right column is just placebo effect. :)
>> The savings are bigger if interrupts are frequent.
>>
>> Signed-off-by: Paolo Bonzini <pbonz...@redhat.com>
>> ---
>>  arch/x86/kvm/lapic.c | 6 ++++--
>>  arch/x86/kvm/vmx.c   | 9 ++++++++-
>>  2 files changed, 12 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index 23b99f305382..63a442aefc12 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -342,9 +342,11 @@ void __kvm_apic_update_irr(u32 *pir, void *regs)
>>      u32 i, pir_val;
>>  
>>      for (i = 0; i <= 7; i++) {
>> -            pir_val = xchg(&pir[i], 0);
>> -            if (pir_val)
>> +            pir_val = READ_ONCE(pir[i]);
>> +            if (pir_val) {
>> +                    pir_val = xchg(&pir[i], 0);
>>                      *((u32 *)(regs + APIC_IRR + i * 0x10)) |= pir_val;
>> +            }
>>      }
>>  }
>>  EXPORT_SYMBOL_GPL(__kvm_apic_update_irr);
> 
> gcc doesn't seem to unroll this loop and it's
> probably worth unrolling it
> 
> The following seems to do the trick for me on upstream - I didn't
> benchmark it though. Is there a kvm unit test for interrupts?

No, not yet.  Purely from a branch-prediction point of view I wouldn't
be so sure that it's worth unrolling it.  Because of the xchg you cannot
make the code branchless, and having a branch per iteration puts some
pressure on the BTB.

This is only a hot path in workloads that have a lot of interrupts and a
lot of vmexits.  If it's always the same interrupt (actually the same
group of 32 interrupts) that triggers, then the branch predictor will
predict the history very easily (e.g. 00001000 00001000 00001000...).

Paolo

> --->
> 
> kvm: unroll the loop in __kvm_apic_update_irr.
> 
> This is hot data path in interrupt-rich workloads, worth unrolling.
> 
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> 
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index b62c852..0c3462c 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -337,7 +337,8 @@ static u8 count_vectors(void *bitmap)
>       return count;
>  }
>  
> -void __kvm_apic_update_irr(u32 *pir, void *regs)
> +void __attribute__((optimize("unroll-loops")))
> +__kvm_apic_update_irr(u32 *pir, void *regs)
>  {
>       u32 i, pir_val;
>  
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Reply via email to