https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88459
Bug ID: 88459 Summary: vectorization failure for a simple sum reduction loop Product: gcc Version: 9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: jiangning.liu at amperecomputing dot com Target Milestone: --- For the simple loop below, gcc -O3 fails to vectorize it. unsigned int tmp[1024]; unsigned int test_vec(int n) { int sum = 0; for(int i = 0; i < 1024; i++) { sum += tmp[i]; } return sum; } The kernel loop is, .L2: ldr w2, [x1], 4 add w0, w0, w2 cmp x3, x1 bne .L2 But if we change the data type of sum from "int" to "unsigned int" as below, unsigned int tmp[1024]; unsigned int test_vec(int n) { unsigned int sum = 0; for(int i = 0; i < 1024; i++) { sum += tmp[i]; } return sum; } gcc can vectorize it, and the kernel loop is like, .L2: ldr q1, [x0], 16 add v0.4s, v0.4s, v1.4s cmp x1, x0 bne .L2