Hi Ben,

Small comment inline.

On 02/09/2012 06:25 AM, Benjamin Herrenschmidt wrote:
 From 0ace17ba6960a8788b1bda3770df254cbbc6a244 Mon Sep 17 00:00:00 2001
From: Benjamin Herrenschmidt<b...@kernel.crashing.org>
Date: Thu, 9 Feb 2012 15:25:04 +1100
Subject: [PATCH] powerpc: Rework lazy-interrupt handling

The current implementation of lazy interrupts handling has some
issues that this tries to address.

Except on iSeries, we don't do the various workarounds we need to
do on re-enable when returning from an interrupt, which can do an
implicit re-enable, and thus we may still lose or get delayed
decrementer or doorbell interrupts.

The current scheme also makes it much harder to handle the external
"edge" interrupts provided by some BookE processors when using the
EPR facility (External Proxy) and the Freescale Hypervisor.

We also hard mask on decrementer interrupts which is sub-optimal.

This is an attempt at fixing it all in one go by reworking the way
we do the lazy interrupt disabling.

The base idea is to replace the "hard_enabled" field with a
"irq_happened" field in which we store a bit mask of what interrupt
occurred while soft-disabled.

When re-enabling, either via arch_local_irq_restore() or when returning
from an interrupt, we can now decide what to do by testing bits in that
field. We then implement re-emitting of the lost interrupts via either
a re-use of the existing exception frame (exception exit case) or via
the creation of a new one from assembly code (arch_local_irq_enable),
without the need to trigger a fake one using set_dec() or similar.

In addition, this adds a few refinements:

  - We no longer  hard disable decrementer interrupts that occur
while soft-disabled. We now simply bump the decrementer back to max
(on BookS) or leave it stopped (on BookE) and continue with hard interrupts
enabled, which means that we'll potentially get better sample quality from
performance monitor interrupts.

  - Timer, decrementer and doorbell interrupts now hard-enable
shortly after removing the source of the interrupt, which means
they no longer run entirely hard disabled. Again, this will improve
perf sample quality.

  - On Book3E 64-bit, we now make the performance monitor interrupt
act as an NMI like Book3S (the necessary C code for that to work
appear to already be present in the FSL perf code, notably calling
nmi_enter instead of irq_enter).

There are additional refinements that we can do on top of this patch:

  - We could remove the ps3 workaround from arch_local_irq_enable(),
I believe that it should no longer be necessary

  - We could make "masked" decrementer interrupts act as NMIs when doing
timer-based perf sampling to improve the sample quality.

  - There are additional simplifications of the exception entry/exit path
that I've spotted along the way, such as merging fast_exception_return
with the normal code path.

This patch needs a LOT more testing&  review than it had so far !!!

Not-signed-off-by-yet: Benjamin Herrenschmidt<b...@kernel.crashing.org>
---

v2:

- Add hard-enable to decrementer, timer and doorbells
- Fix CR clobber in masked irq handling on BookE
- Make embedded perf interrupt act as an NMI
- Add a PACA_HAPPENED_EE_EDGE for use by FSL if they want
   to retrigger an interrupt without preventing hard-enable

Signed-off-by: Benjamin Herrenschmidt<b...@kernel.crashing.org>
---
  arch/powerpc/include/asm/exception-64s.h        |   21 ++-
  arch/powerpc/include/asm/hw_irq.h               |   51 +++++-
  arch/powerpc/include/asm/irqflags.h             |   13 +-
  arch/powerpc/include/asm/paca.h                 |    2 +-
  arch/powerpc/kernel/asm-offsets.c               |    2 +-
  arch/powerpc/kernel/dbell.c                     |   12 ++
  arch/powerpc/kernel/entry_64.S                  |   96 ++++++-----
  arch/powerpc/kernel/exceptions-64e.S            |  210 ++++++++++++++++-------
  arch/powerpc/kernel/exceptions-64s.S            |   90 ++++++----
  arch/powerpc/kernel/head_64.S                   |    9 -
  arch/powerpc/kernel/idle_book3e.S               |    8 +-
  arch/powerpc/kernel/idle_power4.S               |   17 ++-
  arch/powerpc/kernel/idle_power7.S               |   20 ++-
  arch/powerpc/kernel/irq.c                       |  187 ++++++++++++++------
  arch/powerpc/kernel/time.c                      |   15 ++-
  arch/powerpc/platforms/iseries/Makefile         |    2 +-
  arch/powerpc/platforms/iseries/exception.S      |   11 +-
  arch/powerpc/platforms/iseries/misc.S           |   26 ---
  arch/powerpc/platforms/pseries/processor_idle.c |   24 +++-
  19 files changed, 540 insertions(+), 276 deletions(-)
  delete mode 100644 arch/powerpc/platforms/iseries/misc.S


[snip]

diff --git a/arch/powerpc/kernel/exceptions-64e.S 
b/arch/powerpc/kernel/exceptions-64e.S
index 429983c..7fa4096 100644
--- a/arch/powerpc/kernel/exceptions-64e.S
+++ b/arch/powerpc/kernel/exceptions-64e.S
@@ -21,6 +21,7 @@
  #include<asm/exception-64e.h>
  #include<asm/bug.h>
  #include<asm/irqflags.h>
+#include<asm/hw_irq.h>
  #include<asm/ptrace.h>
  #include<asm/ppc-opcode.h>
  #include<asm/mmu.h>
@@ -77,59 +78,55 @@
  #define SPRN_MC_SRR1  SPRN_MCSRR1

  #define NORMAL_EXCEPTION_PROLOG(n, addition)                              \
-       EXCEPTION_PROLOG(n, GEN, addition##_GEN)
+       EXCEPTION_PROLOG(n, GEN, addition##_GEN(n))

  #define CRIT_EXCEPTION_PROLOG(n, addition)                                \
-       EXCEPTION_PROLOG(n, CRIT, addition##_CRIT)
+       EXCEPTION_PROLOG(n, CRIT, addition##_CRIT(n))

  #define DBG_EXCEPTION_PROLOG(n, addition)                                 \
-       EXCEPTION_PROLOG(n, DBG, addition##_DBG)
+       EXCEPTION_PROLOG(n, DBG, addition##_DBG(n))

  #define MC_EXCEPTION_PROLOG(n, addition)                                  \
-       EXCEPTION_PROLOG(n, MC, addition##_MC)
+       EXCEPTION_PROLOG(n, MC, addition##_MC(n))


  /* Variants of the "addition" argument for the prolog
   */
-#define PROLOG_ADDITION_NONE_GEN
-#define PROLOG_ADDITION_NONE_CRIT
-#define PROLOG_ADDITION_NONE_DBG
-#define PROLOG_ADDITION_NONE_MC
+#define PROLOG_ADDITION_NONE_GEN(n)
+#define PROLOG_ADDITION_NONE_CRIT(n)
+#define PROLOG_ADDITION_NONE_DBG(n)
+#define PROLOG_ADDITION_NONE_MC(n)

-#define PROLOG_ADDITION_MASKABLE_GEN                                       \
+#define PROLOG_ADDITION_MASKABLE_GEN(n)                                        
    \
        lbz     r11,PACASOFTIRQEN(r13); /* are irqs soft-disabled ? */      \
        cmpwi   cr0,r11,0;              /* yes ->  go out of line */     \
-       beq     masked_interrupt_book3e;
+       beq     masked_interrupt_book3e_##n;

-#define PROLOG_ADDITION_2REGS_GEN                                          \
+#define PROLOG_ADDITION_2REGS_GEN(n)                                       \
        std     r14,PACA_EXGEN+EX_R14(r13);                                 \
        std     r15,PACA_EXGEN+EX_R15(r13)

-#define PROLOG_ADDITION_1REG_GEN                                           \
+#define PROLOG_ADDITION_1REG_GEN(n)                                        \
        std     r14,PACA_EXGEN+EX_R14(r13);

-#define PROLOG_ADDITION_2REGS_CRIT                                         \
+#define PROLOG_ADDITION_2REGS_CRIT(n)                                      \
        std     r14,PACA_EXCRIT+EX_R14(r13);                                \
        std     r15,PACA_EXCRIT+EX_R15(r13)

-#define PROLOG_ADDITION_2REGS_DBG                                          \
+#define PROLOG_ADDITION_2REGS_DBG(n)                                       \
        std     r14,PACA_EXDBG+EX_R14(r13);                                 \
        std     r15,PACA_EXDBG+EX_R15(r13)

-#define PROLOG_ADDITION_2REGS_MC                                           \
+#define PROLOG_ADDITION_2REGS_MC(n)                                        \
        std     r14,PACA_EXMC+EX_R14(r13);                                  \
        std     r15,PACA_EXMC+EX_R15(r13)

-#define PROLOG_ADDITION_DOORBELL_GEN                                       \
-       lbz     r11,PACASOFTIRQEN(r13); /* are irqs soft-disabled ? */      \
-       cmpwi   cr0,r11,0;              /* yes ->  go out of line */     \
-       beq     masked_doorbell_book3e
-

  /* Core exception code for all exceptions except TLB misses.
   * XXX: Needs to make SPRN_SPRG_GEN depend on exception type
   */
  #define EXCEPTION_COMMON(n, excf, ints)                                       
    \
+exc_##n##_common:                                                          \
        std     r0,GPR0(r1);            /* save r0 in stackframe */         \
        std     r2,GPR2(r1);            /* save r2 in stackframe */         \
        SAVE_4GPRS(3, r1);              /* save r3 - r6 in stackframe */    \
@@ -167,19 +164,25 @@
        std     r0,RESULT(r1);          /* clear regs->result */         \
        ints;

-/* Variants for the "ints" argument */
-#define INTS_KEEP
-#define INTS_DISABLE_SOFT                                                  \
+/* Variants for the "ints" argument. We know r0 is 0 on entry */
+#define INTS_KEEP                                                          \
        stb     r0,PACASOFTIRQEN(r13);  /* mark interrupts soft-disabled */ \
        TRACE_DISABLE_INTS;
-#define INTS_DISABLE_HARD                                                  \
-       stb     r0,PACAHARDIRQEN(r13); /* and hard disabled */
-#define INTS_DISABLE_ALL                                                   \
-       INTS_DISABLE_SOFT                                                   \
-       INTS_DISABLE_HARD
-
-/* This is called by exceptions that used INTS_KEEP (that is did not clear
- * neither soft nor hard IRQ indicators in the PACA. This will restore MSR:EE
+
+/* This second version is meant for exceptions that don't immediately
+ * hard-enable. We set a bit in paca->irq_happened to ensure that
+ * a subsequent call to arch_local_irq_restore() will properly
+ * hard-enable and avoid the fast-path
+ */
+#define INTS_DISABLE                                                       \
+       stb     r0,PACASOFTIRQEN(r13);  /* mark interrupts soft-disabled */ \
+       lbz     r0,PACAIRQHAPPENED(r13);                                    \
+       ori     r0,r0,PACA_HAPPENED;                                        \
+       stb     r0,PACAIRQHAPPENED(r13);                                    \
+       TRACE_DISABLE_INTS;
+
+/* This is called by exceptions that used INTS_KEEP (that is did
+ * not set hard IRQ indicators in the PACA). This will restore MSR:EE
   * to it's previous value
   *
   * XXX In the long run, we may want to open-code it in order to separate the
@@ -238,7 +241,7 @@ exc_##n##_bad_stack:                                        
                    \
  #define MASKABLE_EXCEPTION(trapnum, label, hdlr, ack)                 \
        START_EXCEPTION(label);                                         \
        NORMAL_EXCEPTION_PROLOG(trapnum, PROLOG_ADDITION_MASKABLE)      \
-       EXCEPTION_COMMON(trapnum, PACA_EXGEN, INTS_DISABLE_ALL)         \
+       EXCEPTION_COMMON(trapnum, PACA_EXGEN, INTS_DISABLE)             \
        ack(r8);                                                        \
        CHECK_NAPPING();                                                \
        addi    r3,r1,STACK_FRAME_OVERHEAD;                             \
@@ -289,7 +292,7 @@ interrupt_end_book3e:
  /* Critical Input Interrupt */
        START_EXCEPTION(critical_input);
        CRIT_EXCEPTION_PROLOG(0x100, PROLOG_ADDITION_NONE)
-//     EXCEPTION_COMMON(0x100, PACA_EXCRIT, INTS_DISABLE_ALL)
+//     EXCEPTION_COMMON(0x100, PACA_EXCRIT, INTS_DISABLE)
  //    bl      special_reg_save_crit
  //    CHECK_NAPPING();
  //    addi    r3,r1,STACK_FRAME_OVERHEAD
@@ -300,7 +303,7 @@ interrupt_end_book3e:
  /* Machine Check Interrupt */
        START_EXCEPTION(machine_check);
        CRIT_EXCEPTION_PROLOG(0x200, PROLOG_ADDITION_NONE)
-//     EXCEPTION_COMMON(0x200, PACA_EXMC, INTS_DISABLE_ALL)
+//     EXCEPTION_COMMON(0x200, PACA_EXMC, INTS_DISABLE)
  //    bl      special_reg_save_mc
  //    addi    r3,r1,STACK_FRAME_OVERHEAD
  //    CHECK_NAPPING();
@@ -339,7 +342,7 @@ interrupt_end_book3e:
        START_EXCEPTION(program);
        NORMAL_EXCEPTION_PROLOG(0x700, PROLOG_ADDITION_1REG)
        mfspr   r14,SPRN_ESR
-       EXCEPTION_COMMON(0x700, PACA_EXGEN, INTS_DISABLE_SOFT)
+       EXCEPTION_COMMON(0x700, PACA_EXGEN, INTS_KEEP)
        std     r14,_DSISR(r1)
        addi    r3,r1,STACK_FRAME_OVERHEAD
        ld      r14,PACA_EXGEN+EX_R14(r13)
@@ -372,7 +375,7 @@ interrupt_end_book3e:
  /* Watchdog Timer Interrupt */
        START_EXCEPTION(watchdog);
        CRIT_EXCEPTION_PROLOG(0x9f0, PROLOG_ADDITION_NONE)
-//     EXCEPTION_COMMON(0x9f0, PACA_EXCRIT, INTS_DISABLE_ALL)
+//     EXCEPTION_COMMON(0x9f0, PACA_EXCRIT, INTS_DISABLE)
  //    bl      special_reg_save_crit
  //    CHECK_NAPPING();
  //    addi    r3,r1,STACK_FRAME_OVERHEAD
@@ -450,7 +453,7 @@ interrupt_end_book3e:
        mfspr   r15,SPRN_SPRG_CRIT_SCRATCH
        mtspr   SPRN_SPRG_GEN_SCRATCH,r15
        mfspr   r14,SPRN_DBSR
-       EXCEPTION_COMMON(0xd00, PACA_EXCRIT, INTS_DISABLE_ALL)
+       EXCEPTION_COMMON(0xd00, PACA_EXCRIT, INTS_DISABLE)
        std     r14,_DSISR(r1)
        addi    r3,r1,STACK_FRAME_OVERHEAD
        mr      r4,r14
@@ -515,7 +518,7 @@ kernel_dbg_exc:
        mfspr   r15,SPRN_SPRG_DBG_SCRATCH
        mtspr   SPRN_SPRG_GEN_SCRATCH,r15
        mfspr   r14,SPRN_DBSR
-       EXCEPTION_COMMON(0xd00, PACA_EXDBG, INTS_DISABLE_ALL)
+       EXCEPTION_COMMON(0xd08, PACA_EXDBG, INTS_DISABLE)
        std     r14,_DSISR(r1)
        addi    r3,r1,STACK_FRAME_OVERHEAD
        mr      r4,r14
@@ -525,21 +528,22 @@ kernel_dbg_exc:
        bl      .DebugException
        b       .ret_from_except

-       MASKABLE_EXCEPTION(0x260, perfmon, .performance_monitor_exception, 
ACK_NONE)
+       START_EXCEPTION(perfmon);
+       NORMAL_EXCEPTION_PROLOG(0x260, PROLOG_ADDITION_NONE)
+       EXCEPTION_COMMON(0x260, PACA_EXGEN, INTS_DISABLE)
+       addi    r3,r1,STACK_FRAME_OVERHEAD
+       ld      r14,PACA_EXGEN+EX_R14(r13)
+       bl      .save_nvgprs
+       bl      .performance_monitor_exception
+       b       .ret_from_except

  /* Doorbell interrupt */
-       START_EXCEPTION(doorbell)
-       NORMAL_EXCEPTION_PROLOG(0x2070, PROLOG_ADDITION_DOORBELL)
-       EXCEPTION_COMMON(0x2070, PACA_EXGEN, INTS_DISABLE_ALL)
-       CHECK_NAPPING()
-       addi    r3,r1,STACK_FRAME_OVERHEAD
-       bl      .doorbell_exception
-       b       .ret_from_except_lite
+       MASKABLE_EXCEPTION(0x280, doorbell, .doorbell_exception, ACK_NONE)

  /* Doorbell critical Interrupt */
        START_EXCEPTION(doorbell_crit);
-       CRIT_EXCEPTION_PROLOG(0x2080, PROLOG_ADDITION_NONE)
-//     EXCEPTION_COMMON(0x2080, PACA_EXCRIT, INTS_DISABLE_ALL)
+       CRIT_EXCEPTION_PROLOG(0x2a0, PROLOG_ADDITION_NONE)
+//     EXCEPTION_COMMON(0x280, PACA_EXCRIT, INTS_DISABLE)
  //    bl      special_reg_save_crit
  //    CHECK_NAPPING();
  //    addi    r3,r1,STACK_FRAME_OVERHEAD
@@ -547,38 +551,116 @@ kernel_dbg_exc:
  //    b       ret_from_crit_except
        b       .

+/* Guest Doorbell */
        MASKABLE_EXCEPTION(0x2c0, guest_doorbell, .unknown_exception, ACK_NONE)
-       MASKABLE_EXCEPTION(0x2e0, guest_doorbell_crit, .unknown_exception, 
ACK_NONE)
-       MASKABLE_EXCEPTION(0x310, hypercall, .unknown_exception, ACK_NONE)
-       MASKABLE_EXCEPTION(0x320, ehpriv, .unknown_exception, ACK_NONE)
+
+/* Guest Doorbell critical Interrupt */
+       START_EXCEPTION(guest_doorbell_crit);
+       CRIT_EXCEPTION_PROLOG(0x2e0, PROLOG_ADDITION_NONE)
+//     EXCEPTION_COMMON(0x2e0, PACA_EXCRIT, INTS_DISABLE)
+//     bl      special_reg_save_crit
+//     CHECK_NAPPING();
+//     addi    r3,r1,STACK_FRAME_OVERHEAD
+//     bl      .guest_doorbell_critical_exception
+//     b       ret_from_crit_except
+       b       .
+
+/* Hypervisor call */
+       START_EXCEPTION(hypercall);
+       NORMAL_EXCEPTION_PROLOG(0x310, PROLOG_ADDITION_NONE)
+       EXCEPTION_COMMON(0x310, PACA_EXGEN, INTS_KEEP)
+       addi    r3,r1,STACK_FRAME_OVERHEAD
+       bl      .save_nvgprs
+       INTS_RESTORE_HARD
+       bl      .unknown_exception
+       b       .ret_from_except
+
+/* Embedded Hypervisor priviledged  */
+       START_EXCEPTION(ehpriv);
+       NORMAL_EXCEPTION_PROLOG(0x320, PROLOG_ADDITION_NONE)
+       EXCEPTION_COMMON(0x320, PACA_EXGEN, INTS_KEEP)
+       addi    r3,r1,STACK_FRAME_OVERHEAD
+       bl      .save_nvgprs
+       INTS_RESTORE_HARD
+       bl      .unknown_exception
+       b       .ret_from_except


  /*
- * An interrupt came in while soft-disabled; clear EE in SRR1,
- * clear paca->hard_enabled and return.
+ * An interrupt came in while soft-disabled; We mark paca->irq_happened
+ * accordingly and if the interrupt is level sensitive, we hard disable
   */
-masked_doorbell_book3e:
-       mtcr    r10
-       /* Resend the doorbell to fire again when ints enabled */
-       mfspr   r10,SPRN_PIR
-       PPC_MSGSND(r10)
-       b       masked_interrupt_book3e_common

-masked_interrupt_book3e:
+masked_interrupt_book3e_0x500:
+       li      r11,PACA_HAPPENED_EE
+       b       masked_interrupt_book3e_full_mask
+
+masked_interrupt_book3e_0x900:
+       ACK_DEC(r11);
+       li      r11,PACA_HAPPENED_DEC
+       b       masked_interrupt_book3e_no_mask
+masked_interrupt_book3e_0x980:
+       ACK_FIT(r11);
+       li      r11,PACA_HAPPENED_DEC
+       b       masked_interrupt_book3e_no_mask
+masked_interrupt_book3e_0x280:
+masked_interrupt_book3e_0x2c0:
+       li      r11,PACA_HAPPENED_DBELL
+       b       masked_interrupt_book3e_no_mask
+
+masked_interrupt_book3e_no_mask:
+       mtcr    r10
+       lbz     r10,PACAIRQHAPPENED(r13)
+       ori     r10,r10,r11

Shouldn't this be an 'or'?

+       stb     r10,PACAIRQHAPPENED(r13)
+       b       1f
+masked_interrupt_book3e_full_mask:
        mtcr    r10
-masked_interrupt_book3e_common:
-       stb     r11,PACAHARDIRQEN(r13)
+       lbz     r10,PACAIRQHAPPENED(r13)
+       ori     r10,r10,r11

Same comment.

+       stb     r10,PACAIRQHAPPENED(r13)
        mfspr   r10,SPRN_SRR1
        rldicl  r11,r10,48,1            /* clear MSR_EE */
        rotldi  r10,r11,16
        mtspr   SPRN_SRR1,r10
-       ld      r10,PACA_EXGEN+EX_R10(r13);     /* restore registers */
+1:     ld      r10,PACA_EXGEN+EX_R10(r13);
        ld      r11,PACA_EXGEN+EX_R11(r13);
        mfspr   r13,SPRN_SPRG_GEN_SCRATCH;
        rfi
        b       .

  /*
+ * Called from arch_local_irq_enable when an interrupt needs
+ * to be resent. r3 contains either 0x500,0x900,0x260 or 0x280
+ * to indicate the kind of interrupt. MSR:EE is already off.
+ * We generate a stackframe like if a real interrupt had happened.
+ *
+ * Note: While MSR:EE is off, we need to make sure that _MSR
+ * in the generated frame has EE set to 1 or the exception
+ * handler will not properly re-enable them.
+ */
+_GLOBAL(__reemit_interrupt)
+       /* We are going to jump to the exception common code which
+        * will retrieve various register values from the PACA which
+        * we don't give a damn about.
+        */
+       mflr    r10
+       mfmsr   r11
+       mfcr    r4;
+       mtspr   SPRN_SPRG_GEN_SCRATCH,r13;
+       std     r1,PACA_EXGEN+EX_R1(r13);
+       stw     r4,PACA_EXGEN+EX_CR(r13);
+       ori     r11,r11,MSR_EE
+       subi    r1,r1,INT_FRAME_SIZE;
+       cmpwi   cr0,r3,0x500
+       beq     exc_0x500_common
+       cmpwi   cr0,r3,0x900
+       beq+    exc_0x900_common
+       cmpwi   cr0,r3,0x280
+       beq+    exc_0x280_common
+       blr
+
+/*
   * This is called from 0x300 and 0x400 handlers after the prologs with
   * r14 and r15 containing the fault address and error code, with the
   * original values stashed away in the PACA
@@ -680,6 +762,8 @@ BAD_STACK_TRAMPOLINE(0x000)
  BAD_STACK_TRAMPOLINE(0x100)
  BAD_STACK_TRAMPOLINE(0x200)
  BAD_STACK_TRAMPOLINE(0x260)
+BAD_STACK_TRAMPOLINE(0x280)
+BAD_STACK_TRAMPOLINE(0x2a0)
  BAD_STACK_TRAMPOLINE(0x2c0)
  BAD_STACK_TRAMPOLINE(0x2e0)
  BAD_STACK_TRAMPOLINE(0x300)

---
Best Regards, Laurentiu

_______________________________________________
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Reply via email to