Issue 74263
Summary [Flang] TSVC s314: `fcmp` doesn't have fast-math flags
Labels performance, flang:ir
Assignees
Reporter yus3710-fj
    Flang can't vectorize the loop in `s314` of [TSVC](https://www.netlib.org/benchmark/vectors) while Clang can vectorize the loop written in C.

```fortran
! Fortran version
      do 1 nl = 1,ntimes
      x = a(1)
      do 10 i = 2,n
        if(a(i) .gt. x) x = a(i)
   10 continue
      call dummy(ld,n,a,b,c,d,e,aa,bb,cc,x)
   1  continue
```

```c
// C version
for (int nl = 0; nl < ntimes; nl++) {
  x = a[0];
  for (int i = 1; i < n; i++) {
    if (a[i] > x) {
      x = a[i];
    }
 }
  dummy(a, b, c, d, e, aa, bb, cc, x);
}
```

```console
$ flang-new -v -Ofast s314.f -S -Rpass=vector
flang-new version 18.0.0 (https://github.com/llvm/llvm-project.git 1c1227846425883a3d39ff56700660236a97152c)
Target: aarch64-unknown-linux-gnu
Thread model: posix
InstalledDir: /path/to/install/bin
Found candidate GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Selected GCC installation: /path/to/lib/gcc/aarch64-unknown-linux-gnu/11.2.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
 "/path/to/install/bin/flang-new" -fc1 -triple aarch64-unknown-linux-gnu -S -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -ffast-math -target-cpu generic -target-feature +neon -target-feature +v8a -fstack-arrays -fversion-loops-for-stride -Rpass=vector -O3 -o s314.s -x f95-cpp-input s314.f
$ clang -Ofast s314.c -Rpass=vector
/path/to/s314.c:17:3: remark: vectorized loop (vectorization width: 4, interleaved count: 2) [-Rpass=loop-vectorize]
   17 | for (int i = 0; i < LEN; i++) {
      | ^
```

`fcmp` should have fast-math flags to be recognized as max reduction but Flang doesn't support.

```
.lr.ph: ; preds = %.lr.ph.preheader, %26
  %indvars.iv = phi i64 [ 2, %.lr.ph.preheader ], [ %indvars.iv.next, %26 ]
  %22 = phi float [ %18, %.lr.ph.preheader ], [ %27, %26 ]
  %gep = getelementptr float, ptr getelementptr ([1000 x float], ptr @_QMmodEa, i64 -1, i64 999), i64 %indvars.iv, !dbg !21
  %23 = load float, ptr %gep, align 4, !dbg !21, !tbaa !15
  %24 = fcmp ogt float %23, %22, !dbg !21
  br i1 %24, label %25, label %26, !dbg !21

25: ; preds = %.lr.ph
  store float %23, ptr %12, align 4, !dbg !23, !tbaa !15
  br label %26, !dbg !21

26: ; preds = %25, %.lr.ph
  %27 = phi float [ %23, %25 ], [ %22, %.lr.ph ]
  %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1, !dbg !24
  %exitcond.not = icmp eq i64 %indvars.iv, %21, !dbg !21
  br i1 %exitcond.not, label %._crit_edge, label %.lr.ph, !dbg !21
```
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to