https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88459

            Bug ID: 88459
           Summary: vectorization failure for a simple sum reduction loop
           Product: gcc
           Version: 9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jiangning.liu at amperecomputing dot com
  Target Milestone: ---

For the simple loop below, gcc -O3 fails to vectorize it.

unsigned int tmp[1024];
unsigned int test_vec(int n)
{
        int sum = 0;
        for(int i = 0; i < 1024; i++)
        {
                sum += tmp[i];
        }
        return sum;
}

The kernel loop is,

.L2:
        ldr     w2, [x1], 4
        add     w0, w0, w2
        cmp     x3, x1
        bne     .L2


But if we change the data type of sum from "int" to "unsigned int" as below,

unsigned int tmp[1024];
unsigned int test_vec(int n)
{
        unsigned int sum = 0;
        for(int i = 0; i < 1024; i++)
        {
                sum += tmp[i];
        }
        return sum;
}

gcc can vectorize it, and the kernel loop is like,

.L2:
        ldr     q1, [x0], 16
        add     v0.4s, v0.4s, v1.4s
        cmp     x1, x0
        bne     .L2

Reply via email to