Issue 168775
Summary bad vector codegen on loops with wide accumulators
Labels new issue
Assignees
Reporter XeroOl
    ```
#include <stdint.h>

uint64_t example_1(uint64_t len, const unsigned char* input) {
    uint64_t total = 0;
    for (uint64_t i = 0; i < len; i++) {
        // example computation to be vectorized
        unsigned char output = input[i] ^ 0x07;
        // accumulator
        total += output;
    }
    return total;
}
```
The loop vectorizer optimizes the above function very poorly: it chooses a vectorization width of 2, when it should be able to use a much higher vectorization width, ie 16.


If you pick a more narrow accumulator (ie, change the type of total to uint8_t), the vectorizer is able to choose a higher width.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to