Issue 162550
Summary A pragma incorrectly suppresses FMA in the IR without `-ffp-contract=fast-honor-pragmas`
Labels new issue
Assignees
Reporter wjristow
    For code like the following:

    // -------------- simple.cpp -------------- //
    float compute(float a, float b, float c) {
    #if defined(ENABLE_PRAGMA)
    #pragma clang fp contract (off)
    #endif
      float product = a * b;
      return product + c;
 }
    // ---------------------------------------- //

When `-ffast-math` is used, the cross-statement FMA should happen (and it does).  Enabling the pragma to turn OFF the fp contract bit requires an additional switch to make the pragma effective: `-ffp-contract=fast-honor-pragmas`.  That is, the FMA _still_ happens if the pragma is enabled:

    $ clang++ -S -mfma -O2 -ffast-math -o - simple.cpp | egrep 'mul|add'
            vfmadd213ss %xmm2, %xmm1, %xmm0     # xmm0 = (xmm1 * xmm0) + xmm2
            .addrsig
 $ clang++ -DENABLE_PRAGMA -S -mfma -O2 -ffast-math -o - simple.cpp | egrep 'mul|add'
            vfmadd213ss     %xmm2, %xmm1, %xmm0     # xmm0 = (xmm1 * xmm0) + xmm2
            .addrsig
    $

The cross-statement FMA is suppressed only when the additional switch `-ffp-contract=fast-honor-pragmas` is applied after `-ffast-math` (as documented):

    $ clang++ -DENABLE_PRAGMA -S -mfma -O2 -ffast-math -ffp-contract=fast-honor-pragmas -o - simple.cpp | egrep 'mul|add'
    clang++: warning: overriding '-ffast-math' option with '-ffp-contract=fast-honor-pragmas' [-Woverriding-option]
            vmulss  %xmm0, %xmm1, %xmm0
 vaddss  %xmm2, %xmm0, %xmm0
            .addrsig
    $

However, when IR is generated, the cross-statement FMA is suppressed when the pragma is enabled, both with and without the `-ffp-contract=fast-honor-pragmas` switch:

    $ clang++ -S -mfma -emit-llvm -O2 -ffast-math -o - simple.cpp | egrep 'fmul|fadd' # All the fast-math-flags are on.
      %mul = fmul fast float %b, %a
      %add = fadd fast float %mul, %c
    $ clang++ -DENABLE_PRAGMA -S -mfma -emit-llvm -O2 -ffast-math -o - simple.cpp | egrep 'fmul|fadd' # The 'contract' fast-math-flag is suppressed (INCORRECT).
 %mul = fmul reassoc nnan ninf nsz arcp afn float %b, %a
      %add = fadd reassoc nnan ninf nsz arcp afn float %mul, %c
    $ clang++ -DENABLE_PRAGMA -S -mfma -emit-llvm -O2 -ffast-math -ffp-contract=fast-honor-pragmas -o - simple.cpp | egrep 'fmul|fadd' # The 'contract' fast-math-flag is suppressed (correct).
    clang++: warning: overriding '-ffast-math' option with '-ffp-contract=fast-honor-pragmas' [-Woverriding-option]
      %mul = fmul reassoc nnan ninf nsz arcp afn float %b, %a
      %add = fadd reassoc nnan ninf nsz arcp afn float %mul, %c
    $

A consequence of this is that when LTO is enabled, if the user has a pragma to disable fp contract, it doesn't work.  That is, the cross-statement FMA is only _supposed_ to be suppressed by the pragma when `-ffp-contract=fast-honor-pragmas` is specified (to enable the effectiveness of the pragma).  But it is _always_ suppressed by the pragma (even without that switch) when using LTO.

Here is a standalone run-able test-case to illustrate.  It contains a cross-statement FMA opportunity, and the values that feed into the FMA are such that there is a small numeric difference when the FMA is performed.

    $ cat lto_test.cpp
    // 'noinline' just to make it easy to inspect the generated code.
    __attribute__((noinline)) float compute(float a, float b, float c) {
    #pragma clang fp contract (off)
      float product = a * b;
 return product + c;
    }

    // Declare 'volatile' to suppress compile-time folding:
    volatile float x = 1.7200003f;
    volatile float y = 2.0720003f;
    volatile float z = 3.5720001f;

    extern "C" int printf(const char *, ...);

    int main() {
      float result = compute(x, y, z);
      // Result depends on whether FMA happens:
      // FMA does happen:         7.1358409e+00
      //   FMA does not happen: 7.1358414e+00
      printf("Result: %.7e\n", (double) result);
 return 0;
    }
    $
    $ # `-ffast-math` enables FMA, but the pragma suppresses it:
    $ clang++ -o test.no_fma.pragma.elf -mfma -O2 -ffast-math -ffp-contract=fast-honor-pragmas lto_test.cpp
    clang++: warning: overriding '-ffast-math' option with '-ffp-contract=fast-honor-pragmas' [-Woverriding-option]
    $ test.no_fma.pragma.elf
    Result: 7.1358414e+00
    $
    $ # `-ffast-math` is not on, so cross-statement FMA does not happen:
    $ clang++ -o test.no_fma.elf -mfma -O2 lto_test.cpp
 $ test.no_fma.elf
    Result: 7.1358414e+00
    $
    $ # `-ffast-math` is on, so cross-statement FMA does happen:
    $ clang++ -o test.yes_fma.elf -mfma -O2 -ffast-math lto_test.cpp
    $ test.yes_fma.elf
    Result: 7.1358409e+00
    $
    $ # Same as prev but with LTO enabled, so FMA should happen, but it does not (the bug):
    $ clang++ -o test.should_be_yes_fma.lto.elf -mfma -flto -O2 -ffast-math lto_test.cpp
 $ test.should_be_yes_fma.lto.elf
    Result: 7.1358414e+00
    $

For reference, here are some points of discussion about the `-ffp-contract=fast-honor-pragmas` concept:
https://discourse.llvm.org/t/fp-contract-fast-and-pragmas/58529
https://discourse.llvm.org/t/rfc-honor-pragmas-with-ffp-contract-fast/80797/14

As an aside, the reason I came across this is because at PlayStation we want to _always_ honor the pragma.  In fact, we've had private changes in our downstream code in-place to honor the pragma since our llvm11-based release (this was before the concept of `-ffp-contract=fast-honor-pragmas` was created -- consequently, at that time we thought that the pragma not being honored was simply a bug).  I have proposed a patch to always honor the pragmas for PlayStation: https://github.com/llvm/llvm-project/pull/162549

The test-case for that patch doesn't use the usual approach of checking the generated IR, because of this bug (so it checks the generated assembly code).
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to