https://bugs.llvm.org/show_bug.cgi?id=40291

            Bug ID: 40291
           Summary: [LoopVectorize] LV miscompiles loop by incorrect
                    reordering of stores
           Product: new-bugs
           Version: trunk
          Hardware: PC
                OS: Windows NT
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: new bugs
          Assignee: unassignedb...@nondot.org
          Reporter: max.kazant...@azul.com
                CC: htmldevelo...@gmail.com, llvm-bugs@lists.llvm.org

Created attachment 21315
  --> https://bugs.llvm.org/attachment.cgi?id=21315&action=edit
simple.ll

Loop Vectorizer manages to emit vector stores and scatters in wrong sequence,
that leads to miscompile. To reproduce, download the attached IR and run

opt -loop-vectorize -S simple.ll

The initial loop makes the following operation 3 times:
1. Load x from memory;
2. Store (x + 1) to this memory;
3. if (x < 1), store 0 to this memory.

Initially the memory is filled with zeros, so the condition in 3. is always
true. Therefore, after these steps the memory should also be filled with zeros.

After vectorization, we see the following pattern (I omit the instructions that
are irrelevant):

vector.body:                                      ; preds = %vector.body,
%vector.ph
...
  call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> zeroinitializer,
<16 x i32*> %25, i32 4, <16 x i1> %30), !alias.scope !0, !noalias !3
...
  call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> zeroinitializer,
<16 x i32*> %35, i32 4, <16 x i1> %44), !alias.scope !0, !noalias !3
...
  store <48 x i32> %interleaved.vec, <48 x i32>* %50, align 4
...
  call void @llvm.masked.scatter.v16i32.v16p0i32(<16 x i32> zeroinitializer,
<16 x i32*> %46, i32 4, <16 x i1> %66), !alias.scope !0, !noalias !3

The store corresponds to vector store of (x + 1) for all array elements, and
scatters correspond to masked store of zeroes to some elements where it is
required (effectively, in all elements). So the right sequence would be to
store non-zero vector first and then do scatter of zeros (3 times). The
vectorizer does 2 scatters before the store, therefore 2/3 of the memory cells
are not zeroed properly.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to