Hi

Currently, smp_processor_id() is used to fetch the current cpu in cpu_idle_loop.
Everytime the idle thread runs, it fetches the current cpu using
smp_processor_id().

For idle thread which is per cpu, current cpu is constant and cannot
change at runtime. So moving the smp_processor_id() before the loop
saves execution cycles/time in loop.

Patch:
----------------------------------------------------------------------

diff --git a/kernel/sched/idle.c b/kernel/sched/idle.c
index 1214f0a..82698e5 100644
--- a/kernel/sched/idle.c
+++ b/kernel/sched/idle.c
@@ -185,6 +185,8 @@ exit_idle:
*/
static void cpu_idle_loop(void)
{
+       int cpu_id;
+       cpu_id = smp_processor_id();

        while(1) {
            /*
            * If the arch has a polling bit, we maintain an invariant:
@@ -202,7 +204,7 @@ static void cpu_idle_loop(void)
                        check_pgt_cache();
                        rmb();

-                       if (cpu_is_offline(smp_processor_id()))
+                       if (cpu_is_offline(cpu_id))
                                arch_cpu_idle_dead();

                        local_irq_disable();

--------------------------------------------------------------------

With patch I observed the assembly code(x-86 and ARM64), it saves
instructions related to smp_processor_id().

For x-86:

Before patch(execution in loop):

148:   0f ae e8                lfence
14b:   65 8b 04 25 00 00 00    mov    %gs:0x0,%eax
152:   00
153:   89 c0                   mov    %eax,%eax
155:   49 0f a3 04 24          bt     %rax,(%r12)

After patch(execution in loop):

150:   0f ae e8                lfence
153:   4d 0f a3 34 24          bt     %r14,(%r12)


For ARM64:

Before patch(execution in loop):

168:   d5033d9f        dsb     ld
16c:   b9405661        ldr     w1, [x19,#84]
170:   1100fc20        add     w0, w1, #0x3f
174:   6b1f003f        cmp     w1, wzr
178:   1a81b000        csel    w0, w0, w1, lt
17c:   13067c00        asr     w0, w0, #6
180:   937d7c00        sbfiz   x0, x0, #3, #32
184:   f8606aa0        ldr     x0, [x21,x0]
188:   9ac12401        lsr     x1, x0, x1
18c:   36000e61        tbz     w1, #0, 358

After patch(execution in loop):

1a8:   d5033d9f        dsb     ld
1ac:   f8776ac0        ldr     x0, [x22,x23]
1b0:   ea18001f        tst     x0, x24
1b4:   54000ea0        b.eq    388

Further observance for 4 seconds on ARM64 architecture shows that cpu_idle_loop 
is
hit 8672 times. If calculation mechanism is changed it will save
instructions and eventually time as well.

Signed-off-by: gaurav jindal<gaurav.jin...@spreadtrum.com>
Reviewed-by: sanjeev yadav<sanjeev.ya...@spreadtrum.com>

Reply via email to