On 27.04.2012, at 13:23, Alexander Graf wrote:
>
> On 27.04.2012, at 07:48, Paul Mackerras wrote:
>
>> On Thu, Apr 26, 2012 at 12:19:03PM +0200, Alexander Graf wrote:
>>
>>> So switch the code over to call into the Linux C handlers from C code,
>>> speeding up everything along the way.
>>
>> I have to say this patch makes me pretty uneasy. There are a few
>> things that look wrong to me, but more than that, it seems to me that
>> there would be a lot of careful thought needed to make sure that the
>> approach is bullet-proof.
>
> Yay, finally some review on it :). This method is currently used identically
> in booke hv, so everything we find broken here also applies there!
>
>> The first thing is that you are filling in the registers, and in
>> particular r1, in a subroutine, so you are potentially making regs.r1
>> point to a stack frame that no longer exists by the time we look at it
>> inside do_IRQ or timer_interrupt. So, for example, a stack trace
>> could go completely off the rails at that point. Quite possibly gcc
>> will inline the kvmppc_fill_pt_regs function, which would probably
>> save you, but I don't think you should rely on that.
>
> Ugh. Right.
>
>> The second thing is, why do you save just r1, ip, msr, and lr? Why
>> those ones and no others? A performance monitor interrupt might well
>> decide to save all the registers away plus a stack trace, and to see
>> all the GPRs as 0 could be very confusing.
>
> Well, any other state at that point is pretty useless, since we've long
> deferred from the original IP the interrupt arrived at. So if a perfmon
> module reads out other GPRs there, they are basically guaranteed to be
> useless anyway, no?
>
>> Thirdly, if preemption is enabled, it could well be that the interrupt
>> will wake up a higher priority task which should run before we
>> continue. On 64-bit you are probably saved by the soft_irq_enable
>> calls, which will (I think) call schedule() if a reschedule is
>> pending, but on 32-bit soft_irq_enable does nothing.
>
> At that point preemption is disabled.
>
>> Fourthly, as Ben said, you should be setting regs->trap.
>
> Yup :). Very good catch that one.
>
>> Have you measured a performance improvement with this patch? If so
>> how big was it?
>
> Yeah, I tried things on 970 in an mfsprg/mtsprg busy loop. I measured 3
> different variants:
>
> C irq handling: 1004944 exits/sec
> asm irq handling: 1001774 exits/sec
> asm + hsrr patch: 994719 exits/sec
>
> So as you can see, that code change does have quite an impact. But maybe the
> added complexity isn't worth it? Either way, we should try and find a
> solution that works the same way for booke and book3s - I don't want such an
> integral part to differ all that much.
How about this patch? This at least gets rid of the hsrr performance penalty:
diff --git a/arch/powerpc/kvm/book3s_segment.S
b/arch/powerpc/kvm/book3s_segment.S
index 6bae0a9..8b2fc66 100644
--- a/arch/powerpc/kvm/book3s_segment.S
+++ b/arch/powerpc/kvm/book3s_segment.S
@@ -198,6 +198,7 @@ kvmppc_interrupt:
/* Save guest PC and MSR */
#ifdef CONFIG_PPC64
BEGIN_FTR_SECTION
+ mr r10, r12
andi. r0,r12,0x2
beq 1f
mfspr r3,SPRN_HSRR0
@@ -317,23 +318,17 @@ no_dcbz32_off:
* Having set up SRR0/1 with the address where we want
* to continue with relocation on (potentially in module
* space), we either just go straight there with rfi[d],
- * or we jump to an interrupt handler with bctr if there
- * is an interrupt to be handled first. In the latter
- * case, the rfi[d] at the end of the interrupt handler
- * will get us back to where we want to continue.
+ * or we jump to an interrupt handler if there is an
+ * interrupt to be handled first. In the latter case,
+ * the rfi[d] at the end of the interrupt handler will
+ * get us back to where we want to continue.
*/
- cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
- beq 1f
- cmpwi r12, BOOK3S_INTERRUPT_DECREMENTER
- beq 1f
- cmpwi r12, BOOK3S_INTERRUPT_PERFMON
-1: mtctr r12
-
/* Register usage at this point:
*
* R1 = host R1
* R2 = host R2
+ * R10 = raw exit handler id
* R12 = exit handler id
* R13 = shadow vcpu (32-bit) or PACA (64-bit)
* SVCPU.* = guest *
@@ -343,12 +338,26 @@ no_dcbz32_off:
PPC_LL r6, HSTATE_HOST_MSR(r13)
PPC_LL r8, HSTATE_VMHANDLER(r13)
- /* Restore host msr -> SRR1 */
+#ifdef CONFIG_PPC64
+BEGIN_FTR_SECTION
+ andi. r0,r10,0x2
+ beq 1f
+ mtspr SPRN_HSRR1, r6
+ mtspr SPRN_HSRR0, r8
+END_FTR_SECTION_IFSET(CPU_FTR_HVMODE)
+#endif
+1: /* Restore host msr -> SRR1 */
mtsrr1 r6
/* Load highmem handler address */
mtsrr0 r8
/* RFI into the highmem handler, or jump to interrupt handler */
- beqctr
+ cmpwi r12, BOOK3S_INTERRUPT_EXTERNAL
+ beqa BOOK3S_INTERRUPT_EXTERNAL
+ cmpwi r12, BOOK3S_INTERRUPT_DECREMENTER
+ beqa BOOK3S_INTERRUPT_DECREMENTER
+ cmpwi r12, BOOK3S_INTERRUPT_PERFMON
+ beqa BOOK3S_INTERRUPT_PERFMON
+
RFI
kvmppc_handler_trampoline_exit_end:
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html