http://bugs.llvm.org/show_bug.cgi?id=32085

            Bug ID: 32085
           Summary: Extra broadcasts in doubly-unrolled avx2 memcpy loop
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Backend: X86
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected]

At clang head

$ echo '#include <cstring>
void* go5(int val) {
  int* arr = new int[8 * 128];
  for (int i = 0; i < 8; i++) {
    for (int j = 0; j < 128; j++) {
      memcpy(&arr[i * 128 + j], &val, sizeof(int));
    }
  }
  return arr;
}' |  clang++ -O2 -x c++ -g0 --std=c++11 -mavx2 - -o - -S -mllvm
--x86-asm-syntax=intel

Output: https://gist.github.com/da5e8e50ba43cf1600ac652b35fd6746

LLVM unrolls both loops, but at the beginning of each iteration of the outer
loop, we re-broadcast into our ymm register.

        vmovd   xmm0, ebx
        vbroadcastss    ymm0, xmm0
        vmovups ymmword ptr [rax], ymm0
        vmovups ymmword ptr [rax + 32], ymm0
        [...]
        vmovd   xmm0, ebx
        vbroadcastss    ymm0, xmm0
        vmovups ymmword ptr [rax + 512], ymm0
        vmovups ymmword ptr [rax + 544], ymm0
        [...]

We shouldn't need to do this.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to