MaxGraey wrote:

You can also try the branchless version:
```cpp
uint64_t NextPowerOf2_New_Branchless(uint64_t A) {
  uint64_t Shift = Log2_64_Ceil(A + 1);
  uint64_t Res = UINT64_C(1) << Shift;
  return Res & -!(Shift >> 6);
}
```
https://godbolt.org/z/1Pe3Kz5Mc

For clang it should be even more optimal:
```asm
  mov     edx, 127
  bsr     rdx, rdi
  xor     rdx, 63
  mov     ecx, edx
  neg     cl
  mov     eax, 1
  shl     rax, cl
  test    rdx, rdx
  cmove   rax, rdx
  ret
```

https://github.com/llvm/llvm-project/pull/189160
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to