https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121949
Bug ID: 121949 Summary: Missed shift vectorization when IV value has a different datatype Product: gcc Version: 16.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Blocks: 53947 Target Milestone: --- I think the solution to this is probably the same as in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119860#c1 but filing it as a separate tickets as something to test with. The following example: void f1(long long word, long long* acc) { for (long long row = 0; row < 64; ++row) { if (word & (1ull << row)) { acc[row] += row; } } } void f2(long long word, long long* acc) { for (int row = 0; row < 64; ++row) { if (word & (1ull << row)) { acc[row] += row; } } } with -O3 -march=armv8-a+sve vectorizes with f1 but doesn't with f2. This is because the shift amount "row" is 32-bits but the datatype of the shift 64-bits. It seems the vectorizer doesn't support increasing the VF and simply extending the value to 64-bits in this case and instead refuses to vectorize. While the optimal solution may be to just extend row to a 64-bit IV, it's unclear why we didn't support unpacking in this case. Referenced Bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947 [Bug 53947] [meta-bug] vectorizer missed-optimizations