>As Andrew's other comment notes, the performance details can be >processor and platform specific so discerning small performance >differences really needs to be data-driven.
Then here is some data for my configuration (Jdk8u92/Win7/Core i7 4770S): // arg = 123456.789: Benchmark Mode Samples Score Score error Units n.r.RoundMul.bench_roundJdk_double thrpt 5 345791733,235 44921158,366 ops/s n.r.RoundMul.bench_roundJdk_double_DONT_INLINE thrpt 5 198139328,055 37295625,269 ops/s n.r.RoundMul.bench_roundJdk_double_EXCLUDE thrpt 5 28674938,029 2257234,498 ops/s n.r.RoundMul.bench_roundJdk_double_INLINE thrpt 5 387090734,860 111385980,336 ops/s n.r.RoundMul.bench_roundMul_double thrpt 5 358461230,357 4769420,930 ops/s n.r.RoundMul.bench_roundMul_double_DONT_INLINE thrpt 5 230565623,867 6904679,377 ops/s n.r.RoundMul.bench_roundMul_double_EXCLUDE thrpt 5 33027875,346 756618,102 ops/s n.r.RoundMul.bench_roundMul_double_INLINE thrpt 5 358131402,671 3077717,219 ops/s // arg = -123456.789: Benchmark Mode Samples Score Score error Units n.r.RoundMul.bench_roundJdk_double thrpt 5 334887992,224 30791767,263 ops/s n.r.RoundMul.bench_roundJdk_double_DONT_INLINE thrpt 5 193664353,776 22112771,184 ops/s n.r.RoundMul.bench_roundJdk_double_EXCLUDE thrpt 5 29657900,088 11425756,457 ops/s n.r.RoundMul.bench_roundJdk_double_INLINE thrpt 5 391304965,549 9281466,086 ops/s n.r.RoundMul.bench_roundMul_double thrpt 5 358014997,332 5162810,933 ops/s n.r.RoundMul.bench_roundMul_double_DONT_INLINE thrpt 5 229850524,665 5632201,764 ops/s n.r.RoundMul.bench_roundMul_double_EXCLUDE thrpt 5 33221440,252 1037541,018 ops/s n.r.RoundMul.bench_roundMul_double_INLINE thrpt 5 358823098,125 6627448,955 ops/s ===> With multiply it's faster when not inlined, but slower when inlined. For some reason the score error is smaller with multiply. -Jeff