On Mon, Apr 27, 2015 at 08:38:54PM +0200, Borislav Petkov wrote:
> I'm running them now and will report numbers relative to the last run
> once it is done. And those numbers should in practice get even better if
> we revert to the simpler canonical-ness check but let's see...

Results are done. New row is F: which is with the F16h NOPs.

With all things equal and with this change ontop:

---
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index aef653193160..d713080005ef 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -227,6 +227,14 @@ void __init arch_init_ideal_nops(void)
 #endif
                }
                break;
+
+       case X86_VENDOR_AMD:
+               if (boot_cpu_data.x86 == 0x16) {
+                       ideal_nops = p6_nops;
+                       return;
+               }
+
+               /* fall through */
        default:
 #ifdef CONFIG_X86_64
                ideal_nops = k8_nops;
---

... cycles, instructions, branches, branch-misses, context-switches
drop or remain roughly the same. BUT(!) timings increases.
cpu-clock/task-clock and duration of the workload are all the worst of
all possible cases.

So either those NOPs are not really optimal (i.e., trusting the manuals
and so on :-)) or it is their alignment.

But look at the chapter in the manual - "2.7.2.1 Encoding Padding for
Loop Alignment" - those NOPs are supposed to be used as padding so
they themselves will not be necessarily aligned when you use them to pad
stuff.

Or maybe using the longer NOPs is probably worse than the shorter 4-byte
ones with 3 0x66 prefixes which should "flow" easier through the pipe
due to their smaller length.

Or something completely different...

Oh well, enough measurements for today - will do the rc1 measurement
tomorrow.

Thanks.

---
 Performance counter stats for 'system wide' (10 runs):

A:    2835570.145246      cpu-clock (msec)                                      
        ( +-  0.02% ) [100.00%]
B:    2833364.074970      cpu-clock (msec)                                      
        ( +-  0.04% ) [100.00%]
C:    2834708.335431      cpu-clock (msec)                                      
        ( +-  0.02% ) [100.00%]
D:    2835055.118431      cpu-clock (msec)                                      
        ( +-  0.01% ) [100.00%]
E:    2833115.118624      cpu-clock (msec)                                      
        ( +-  0.06% ) [100.00%]
F:    2835863.670798      cpu-clock (msec)                                      
        ( +-  0.02% ) [100.00%]

A:    2835570.099981      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.02% ) [100.00%]
B:    2833364.073633      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.04% ) [100.00%]
C:    2834708.350387      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.02% ) [100.00%]
D:    2835055.094383      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.01% ) [100.00%]
E:    2833115.145292      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.06% ) [100.00%]
F:    2835863.719556      task-clock (msec)         #    3.996 CPUs utilized    
        ( +-  0.02% ) [100.00%]

A: 5,591,213,166,613      cycles                    #    1.972 GHz              
        ( +-  0.03% ) [75.00%]
B: 5,585,023,802,888      cycles                    #    1.971 GHz              
        ( +-  0.03% ) [75.00%]
C: 5,587,983,212,758      cycles                    #    1.971 GHz              
        ( +-  0.02% ) [75.00%]
D: 5,584,838,532,936      cycles                    #    1.970 GHz              
        ( +-  0.03% ) [75.00%]
E: 5,583,979,727,842      cycles                    #    1.971 GHz              
        ( +-  0.05% ) [75.00%]
F: 5,581,639,840,197      cycles                    #    1.968 GHz              
        ( +-  0.03% ) [75.00%]

A: 3,106,707,101,530      instructions              #    0.56  insns per cycle  
        ( +-  0.01% ) [75.00%]
B: 3,106,632,251,528      instructions              #    0.56  insns per cycle  
        ( +-  0.00% ) [75.00%]
C: 3,106,265,958,142      instructions              #    0.56  insns per cycle  
        ( +-  0.00% ) [75.00%]
D: 3,106,294,801,185      instructions              #    0.56  insns per cycle  
        ( +-  0.00% ) [75.00%]
E: 3,106,381,223,355      instructions              #    0.56  insns per cycle  
        ( +-  0.01% ) [75.00%]
F: 3,105,996,162,436      instructions              #    0.56  insns per cycle  
        ( +-  0.00% ) [75.00%]

A:   683,676,044,429      branches                  #  241.107 M/sec            
        ( +-  0.01% ) [75.00%]
B:   683,670,899,595      branches                  #  241.293 M/sec            
        ( +-  0.01% ) [75.00%]
C:   683,675,772,858      branches                  #  241.180 M/sec            
        ( +-  0.01% ) [75.00%]
D:   683,683,533,664      branches                  #  241.154 M/sec            
        ( +-  0.00% ) [75.00%]
E:   683,648,518,667      branches                  #  241.306 M/sec            
        ( +-  0.01% ) [75.00%]
F:   683,663,028,656      branches                  #  241.078 M/sec            
        ( +-  0.00% ) [75.00%]

A:    43,829,535,008      branch-misses             #    6.41% of all branches  
        ( +-  0.02% ) [75.00%]
B:    43,844,118,416      branch-misses             #    6.41% of all branches  
        ( +-  0.03% ) [75.00%]
C:    43,819,871,086      branch-misses             #    6.41% of all branches  
        ( +-  0.02% ) [75.00%]
D:    43,795,107,998      branch-misses             #    6.41% of all branches  
        ( +-  0.02% ) [75.00%]
E:    43,801,985,070      branch-misses             #    6.41% of all branches  
        ( +-  0.02% ) [75.00%]
F:    43,804,449,271      branch-misses             #    6.41% of all branches  
        ( +-  0.02% ) [75.00%]

A:         2,030,357      context-switches          #    0.716 K/sec            
        ( +-  0.06% ) [100.00%]
B:         2,029,313      context-switches          #    0.716 K/sec            
        ( +-  0.05% ) [100.00%]
C:         2,028,566      context-switches          #    0.716 K/sec            
        ( +-  0.06% ) [100.00%]
D:         2,028,895      context-switches          #    0.716 K/sec            
        ( +-  0.06% ) [100.00%]
E:         2,031,008      context-switches          #    0.717 K/sec            
        ( +-  0.09% ) [100.00%]
F:         2,028,132      context-switches          #    0.715 K/sec            
        ( +-  0.05% ) [100.00%]

A:            52,421      migrations                #    0.018 K/sec            
        ( +-  1.13% )
B:            52,049      migrations                #    0.018 K/sec            
        ( +-  1.02% )
C:            51,365      migrations                #    0.018 K/sec            
        ( +-  0.92% )
D:            51,766      migrations                #    0.018 K/sec            
        ( +-  1.11% )
E:            53,047      migrations                #    0.019 K/sec            
        ( +-  1.08% )
F:            51,447      migrations                #    0.018 K/sec            
        ( +-  0.86% )

A:     709.528485252 seconds time elapsed                                       
   ( +-  0.02% )
B:     708.976557288 seconds time elapsed                                       
   ( +-  0.04% )
C:     709.312844791 seconds time elapsed                                       
   ( +-  0.02% )
D:     709.400050112 seconds time elapsed                                       
   ( +-  0.01% )
E:     708.914562508 seconds time elapsed                                       
   ( +-  0.06% )
F:     709.602255085 seconds time elapsed                                       
   ( +-  0.02% )

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to