| Issue |
76701
|
| Summary |
prefix-xor should be optimized to carryless multiply
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
Validark
|
In [this article](https://validark.github.io/mask_even_odd_bits.html#prefix-xor) I explain the prefix-xor operation and explain how it can be computed with a carryless multiply.
Basically, I would like to see LLVM recognize this pattern:
```zig
fn prefixXOR(bitstring: u64) u64 {
var x = bitstring;
x ^= x << 1;
x ^= x << 2;
x ^= x << 4;
x ^= x << 8;
x ^= x << 16;
x ^= x << 32;
return x;
}
```
and possibly:
```zig
fn prefixXOR(bitstring: u64) u64 {
var x = bitstring;
inline for (1..64) |i| { // i ∈ [1, 63]
x ^= bitstring << i;
}
return x;
}
```
And convert it to a carryless multiply:
```zig
fn prefixXOR(bitmask: u64) u64 {
const all_ones: u64 = @bitCast(@as(i64, -1));
return @mulCarryless(bitmask, all_ones);
}
```
I believe it would look like so in x86-64 assembly:
```asm
prefixXOR:
vmovq xmm0, rdi
vpcmpeqd xmm1, xmm1, xmm1
vpclmulqdq xmm0, xmm1, xmm0, 0
vmovq rax, xmm0
ret
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs