Hi, On 05/06/16 19:17, Jeff Hain wrote:
> > While playing around with Math.round(double) code, > I found out that > > if (longBits < 0) { > r = -r; > } > > can be replaced with: > > long bitsSignum = (((longBits >> 63) << 1) + 1); // 2*0+1 = 1, or 2*-1+1 = -1 > r *= bitsSignum; > > which seems a bit faster, as one could expect due to less branching. Not necessarily. I get this with the original: cmp x10, #0x0 mov x13, xzr sub x13, x13, x12 csel x10, x13, x12, lt and with yours: asr x12, x10, #63 lsl x12, x12, #1 add x12, x12, #0x1 mul x10, x10, x12 (This is AArch64, but x86 is similar.) The pronblem is that most of the instructions in the former are single cycle, but MUL has a five-cycle latency. And there is also a conditional negate instruction which the C2 compiler isn't smart enough at the moment to awlways generate, but we will fix that. Andrew.