>As Andrew's other comment notes, the performance details can be
>processor and platform specific so discerning small performance
>differences really needs to be data-driven.

Then here is some data for my configuration (Jdk8u92/Win7/Core i7 4770S):

// arg = 123456.789:
Benchmark                                          Mode   Samples        Score  
Score error    Units
n.r.RoundMul.bench_roundJdk_double                thrpt         5 345791733,235 
44921158,366    ops/s
n.r.RoundMul.bench_roundJdk_double_DONT_INLINE    thrpt         5 198139328,055 
37295625,269    ops/s
n.r.RoundMul.bench_roundJdk_double_EXCLUDE        thrpt         5 28674938,029  
2257234,498    ops/s
n.r.RoundMul.bench_roundJdk_double_INLINE         thrpt         5 387090734,860 
111385980,336    ops/s
n.r.RoundMul.bench_roundMul_double                thrpt         5 358461230,357 
 4769420,930    ops/s
n.r.RoundMul.bench_roundMul_double_DONT_INLINE    thrpt         5 230565623,867 
 6904679,377    ops/s
n.r.RoundMul.bench_roundMul_double_EXCLUDE        thrpt         5 33027875,346  
 756618,102    ops/s
n.r.RoundMul.bench_roundMul_double_INLINE         thrpt         5 358131402,671 
 3077717,219    ops/s

// arg = -123456.789:
Benchmark                                          Mode   Samples        Score  
Score error    Units
n.r.RoundMul.bench_roundJdk_double                thrpt         5 334887992,224 
30791767,263    ops/s
n.r.RoundMul.bench_roundJdk_double_DONT_INLINE    thrpt         5 193664353,776 
22112771,184    ops/s
n.r.RoundMul.bench_roundJdk_double_EXCLUDE        thrpt         5 29657900,088 
11425756,457    ops/s
n.r.RoundMul.bench_roundJdk_double_INLINE         thrpt         5 391304965,549 
 9281466,086    ops/s
n.r.RoundMul.bench_roundMul_double                thrpt         5 358014997,332 
 5162810,933    ops/s
n.r.RoundMul.bench_roundMul_double_DONT_INLINE    thrpt         5 229850524,665 
 5632201,764    ops/s
n.r.RoundMul.bench_roundMul_double_EXCLUDE        thrpt         5 33221440,252  
1037541,018    ops/s
n.r.RoundMul.bench_roundMul_double_INLINE         thrpt         5 358823098,125 
 6627448,955    ops/s

===>
With multiply it's faster when not inlined, but slower when inlined.
For some reason the score error is smaller with multiply.



-Jeff

Reply via email to