eisenwave wrote:

FYI I compared some of this in a benchmark:

<img width="1735" height="975" alt="image" 
src="https://github.com/user-attachments/assets/ef03118e-9d36-4b97-ac1e-5dfb79fa5d1c";
 />

https://quick-bench.com/q/lqt9N8l715lwl9I4On2-hNdrV_o

The `naive` versions are simple linear loops, but I did make them branchless. 
The `fast` versions are the Hacker's Delight algorithms, and `native` is just 
using the `pext`/`pdep` instructions.

Making the naive versions branching makes them much slower.

So for the codegen, I'm definitely going to go for the Hacker's Delight 
algorithm because it always seems to beat the naive form. To be fair, this 
benchmark is unfair because it uses uniformly random masks and inputs, but 
that's not a realistic situation in practice. However, the Hacker's delight 
versions are faster even when testing on 100% zeroed data.

https://github.com/llvm/llvm-project/pull/200114
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to