On Fri, 2 Jul 2021 13:47:40 GMT, Andrew Haley <a...@openjdk.org> wrote:

>> You can also do that branchlessly which might prove better
>> 
>>          long result = Math.multiplyHigh(x, y);
>>          result += (y & (x >> 63));
>>          result += (x & (y >> 63));
>>          return result;
>
>> You can also do that branchlessly which might prove better
>> 
>> ```
>>      long result = Math.multiplyHigh(x, y);
>>      result += (y & (x >> 63));
>>      result += (x & (y >> 63));
>>      return result;
>> ```
> I doubt very much that it would be better, because these days branch 
> prediction is excellent, and we also have conditional select instructions. 
> Exposing the condition helps C2 to eliminate it if the range of args is 
> known. The `if` code is easier to understand.
> 
> Benchmark results, with one of the operands changing signs every iteration, 
> 1000 iterations:
> 
> 
> Benchmark                  Mode  Cnt     Score    Error  Units
> MulHiTest.mulHiTest1   (aph)     avgt    3  1570.587 ± 16.602  ns/op
> MulHiTest.mulHiTest2   (adinn)   avgt    3  2237.637 ±  4.740  ns/op
> 
> In any case, note that with this optimization the unsigned mulHi is in the 
> nanosecond range, so Good Enough. IMO.

But weirdly, it's the other way around on AArch64, but there's little in it:


Benchmark             Mode  Cnt     Score   Error  Units
MulHiTest.mulHiTest1  avgt    3  1492.108 ± 0.301  ns/op
MulHiTest.mulHiTest2  avgt    3  1219.521 ± 1.516  ns/op


but this is only in the case where we have unpredictable branches. Go with 
simple and easy to understand; it doesn't much matter.

-------------

PR: https://git.openjdk.java.net/jdk/pull/4644

Reply via email to