Issue 76701
Summary prefix-xor should be optimized to carryless multiply
Labels new issue
Assignees
Reporter Validark
    In [this article](https://validark.github.io/mask_even_odd_bits.html#prefix-xor) I explain the prefix-xor operation and explain how it can be computed with a carryless multiply.

Basically, I would like to see LLVM recognize this pattern:

```zig
fn prefixXOR(bitstring: u64) u64 {
    var x = bitstring;
    x ^= x << 1;
    x ^= x << 2;
    x ^= x << 4;
    x ^= x << 8;
    x ^= x << 16;
    x ^= x << 32;
    return x;
}
```

and possibly:

```zig
fn prefixXOR(bitstring: u64) u64 {
    var x = bitstring;
    inline for (1..64) |i| { // i ∈ [1, 63]
        x ^= bitstring << i;
    }
    return x;
}
```

And convert it to a carryless multiply:

```zig
fn prefixXOR(bitmask: u64) u64 {
    const all_ones: u64 = @bitCast(@as(i64, -1));
    return @mulCarryless(bitmask, all_ones);
}
```

I believe it would look like so in x86-64 assembly:

```asm
prefixXOR:
        vmovq   xmm0, rdi
        vpcmpeqd        xmm1, xmm1, xmm1
        vpclmulqdq      xmm0, xmm1, xmm0, 0
        vmovq   rax, xmm0
        ret
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to