| Issue |
168775
|
| Summary |
bad vector codegen on loops with wide accumulators
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
XeroOl
|
```
#include <stdint.h>
uint64_t example_1(uint64_t len, const unsigned char* input) {
uint64_t total = 0;
for (uint64_t i = 0; i < len; i++) {
// example computation to be vectorized
unsigned char output = input[i] ^ 0x07;
// accumulator
total += output;
}
return total;
}
```
The loop vectorizer optimizes the above function very poorly: it chooses a vectorization width of 2, when it should be able to use a much higher vectorization width, ie 16.
If you pick a more narrow accumulator (ie, change the type of total to uint8_t), the vectorizer is able to choose a higher width.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs