Nicholas Piggin's on June 20, 2019 3:14 pm:
> The idle wake up code in the system reset interrupt is not very
> optimal. There are two requirements: perform idle wake up quickly;
> and save everything including CFAR for non-idle interrupts, with
> no performance requirement.
> 
> The problem with placing the idle test in the middle of the handler
> and using the normal handler code to save CFAR, is that it's quite
> costly (e.g., mfcfar is serialising, speculative workarounds get
> applied, SRR1 has to be reloaded, etc). It also prevents the standard
> interrupt handler boilerplate being used.
> 
> This pain can be avoided by using a dedicated idle interrupt handler
> at the start of the interrupt handler, which restores all registers
> back to the way they were in case it was not an idle wake up. CFAR
> is preserved without saving it before the non-idle case by making that
> the fall-through, and idle is a taken branch.
> 
> Performance seems to be in the noise, but possibly around 0.5% faster,
> the executed instructions certainly look better. The bigger benefit is
> being able to drop in standard interrupt handlers after the idle code,
> which helps with subsequent cleanup and consolidation.
> 
> Signed-off-by: Nicholas Piggin <npig...@gmail.com>
> ---
>  arch/powerpc/kernel/exceptions-64s.S | 89 ++++++++++++++--------------
>  1 file changed, 44 insertions(+), 45 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index e0492912ea79..f582ae30f3f7 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -241,7 +241,7 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
>   * load KBASE for a slight optimisation.
>   */
>  #define BRANCH_TO_C000(reg, label)                                   \
> -     __LOAD_HANDLER(reg, label);                                     \
> +     __LOAD_FAR_HANDLER(reg, label);                                 \
>       mtctr   reg;                                                    \
>       bctr
>  
> @@ -784,16 +784,6 @@ EXC_VIRT_NONE(0x4000, 0x100)
>  
>  
>  EXC_REAL_BEGIN(system_reset, 0x100, 0x100)
> -     SET_SCRATCH0(r13)
> -     EXCEPTION_PROLOG_0 PACA_EXNMI
> -
> -     /* This is EXCEPTION_PROLOG_1 with the idle feature section added */
> -     OPT_SAVE_REG_TO_PACA(PACA_EXNMI+EX_PPR, r9, CPU_FTR_HAS_PPR)
> -     OPT_SAVE_REG_TO_PACA(PACA_EXNMI+EX_CFAR, r10, CPU_FTR_CFAR)
> -     INTERRUPT_TO_KERNEL
> -     SAVE_CTR(r10, PACA_EXNMI)
> -     mfcr    r9
> -
>  #ifdef CONFIG_PPC_P7_NAP
>       /*
>        * If running native on arch 2.06 or later, check if we are waking up
> @@ -801,45 +791,67 @@ EXC_REAL_BEGIN(system_reset, 0x100, 0x100)
>        * bits 46:47. A non-0 value indicates that we are coming from a power
>        * saving state. The idle wakeup handler initially runs in real mode,
>        * but we branch to the 0xc000... address so we can turn on relocation
> -      * with mtmsr.
> +      * with mtmsrd later, after SPRs are restored.
> +      *
> +      * Careful to minimise cost for the fast path (idle wakeup) while
> +      * also avoiding clobbering CFAR for the non-idle case. Once we know
> +      * it is an idle wake, volatiles don't matter, which is why we use
> +      * those here, and then re-do the entry in case of non-idle (without
> +      * branching for the non-idle case, to keep CFAR).
>        */
>  BEGIN_FTR_SECTION
> -     mfspr   r10,SPRN_SRR1
> -     rlwinm. r10,r10,47-31,30,31
> -     beq-    1f
> -     cmpwi   cr1,r10,2
> +     SET_SCRATCH0(r13)
> +     GET_PACA(r13)
> +     std     r3,PACA_EXNMI+0*8(r13)
> +     std     r4,PACA_EXNMI+1*8(r13)
> +     std     r5,PACA_EXNMI+2*8(r13)
>       mfspr   r3,SPRN_SRR1
> -     bltlr   cr1     /* no state loss, return to idle caller */
> -     BRANCH_TO_C000(r10, system_reset_idle_common)
> -1:
> +     mfocrf  r4,0x80
> +     rlwinm. r5,r3,47-31,30,31
> +     bne+    system_reset_idle_wake
> +     /* Not powersave wakeup. Restore regs for regular interrupt handler. */
> +     mtocrf  0x80,r4
> +     ld      r12,PACA_EXNMI+0*8(r13)
> +     ld      r4,PACA_EXNMI+1*8(r13)
> +     ld      r5,PACA_EXNMI+2*8(r13)
> +     GET_SCRATCH0(r13)

For the love of... that should be 'ld r3', not 'ld r12', sorry.

Thanks,
Nick

Reply via email to