[llvm-bugs] [Bug 24373] New: Performance degradation of ‘fft’ test from eembc.1.1 suite on x86 Avoton-1.7 due to [DAGCombine]-shift changes

via llvm-bugs Thu, 06 Aug 2015 03:40:58 -0700

https://llvm.org/bugs/show_bug.cgi?id=24373


            Bug ID: 24373
           Summary: Performance degradation of ‘fft’ test from eembc.1.1
                    suite on x86 Avoton-1.7  due to [DAGCombine]-shift
                    changes
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: normal
          Priority: P
         Component: Scalar Optimizations
          Assignee: [email protected]
          Reporter: [email protected]
                CC: [email protected], [email protected],
                    [email protected], [email protected],
                    [email protected],
                    [email protected], [email protected],
                    [email protected]
    Classification: Unclassified

Created attachment 14700
  --> https://llvm.org/bugs/attachment.cgi?id=14700&action=edit
test

The performance degradation of eembc.1.1/fft00 test is caused by the commit
rev. 240787 with the following comments.

commit 3791d56da63baf5072fa6ecaa872ace6adbc6892
Author: Benjamin Kramer <[email protected]>
Date:   Fri Jun 26 14:51:36 2015 +0000

    [DAGCombine] fold (X >>?,exact C1) << C2 --> X << (C2-C1)

    Instcombine also does this but many opportunities only become visible
    after GEPs are lowered.

    git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@240787
91177308-0d34-0410-b5e6-96231b3b80d8

The performance degradation of benchmark is on the hottest inner loop and
occurs around load address calculations. The IR dumps before ‘Expand ISel
Pseudo-instructions’ phase are the same and it looks as follows.

.lr.ph10:                                         ; preds = %33, %.lr.ph10
  %46 = phi i32 [ %sext21, %.lr.ph10 ], [ %44, %33 ]
  %47 = add nsw i32 %46, %29                                   !!
  %sext4 = shl i32 %47, 16                                     !!
  %48 = ashr exact i32 %sext4, 16                              !!
  %49 = getelementptr inbounds [256 x i16], [256 x i16]* %RealBitRevData, i32
0, i32 %48
  %50 = load i16, i16* %49, align 2, !tbaa !2 
  %51 = sext i16 %50 to i32 
  %52 = mul nsw i32 %51, %38 
  %53 = getelementptr inbounds [256 x i16], [256 x i16]* %ImagBitRevData, i32
0, i32 %48
  %54 = load i16, i16* %53, align 2, !tbaa !2
  %55 = sext i16 %54 to i32
…………………. 

After ‘Expand ISel Pseudo-instructions’ phase the shifts are replaced by
‘movswl’ instruction in rev. 240786 case and remains in the code without
transformations in rev. 240787 case that leads to degradation. Corresponding IR
dump fragments of considered loads are the following.

rev. 240786:
----------------
BB#10: derived from LLVM BB %.lr.ph10  
    Predecessors according to CFG: BB#9 BB#10 
        %vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17
GR32:%vreg16,%vreg18
        %vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,
%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17
        %vreg63<def> = COPY %vreg62:sub_16bit; GR16:%vreg63 GR32:%vreg62
        %vreg64<def> = MOVSX32rr16 %vreg63<kill>; GR32_NOSP:%vreg64
GR16:%vreg63                                           !! movswl
        %vreg65<def> = MOVSX32rm16 <fi#0>, 2, %vreg64, 0, %noreg;
mem:LD2[%49](tbaa=<0x5761c08>) GR32:%vreg65 GR32_NOSP:%vreg64
        %vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,
%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13
        %vreg67<def> = MOVSX32rm16 <fi#1>, 2, %vreg64, 0, %noreg;
mem:LD2[%53](tbaa=<0x5761c08>) GR32:%vreg67 GR32_NOSP:%vreg64
…………………………….

vs.

rev. 240787:
-----------
BB#10: derived from LLVM BB %.lr.ph10
    Predecessors according to CFG: BB#9 BB#10
        %vreg17<def> = PHI %vreg16, <BB#9>, %vreg18, <BB#10>; GR32_NOSP:%vreg17
GR32:%vreg16,%vreg18
        %vreg62<def,tied1> = ADD32rr %vreg17<tied0>, %vreg9,
%EFLAGS<imp-def,dead>; GR32:%vreg62,%vreg9 GR32_NOSP:%vreg17
        %vreg63<def,tied1> = SHL32ri %vreg62<tied0>, 16, %EFLAGS<imp-def,dead>;
GR32:%vreg63,%vreg62                                            !! 
        %vreg64<def,tied1> = SAR32ri %vreg63<tied0>, 15, %EFLAGS<imp-def,dead>;
GR32_NOSP:%vreg64 GR32:%vreg63                                  !!
        %vreg65<def> = MOVSX32rm16 <fi#0>, 1, %vreg64, 0, %noreg; mem:LD2[%47]
GR32:%vreg65 GR32_NOSP:%vreg64
        %vreg66<def,tied1> = IMUL32rr %vreg65<tied0>, %vreg13,
%EFLAGS<imp-def,dead>; GR32:%vreg66,%vreg65,%vreg13
        %vreg67<def> = MOVSX32rm16 <fi#1>, 1, %vreg64, 0, %noreg; mem:LD2[%51]
GR32:%vreg67 GR32_NOSP:%vreg64
………………………………………………..

Test fft00.ll and IR dumps for two revisions are in attachment. Command line
for reproducing is the following.

clang   -m32 -fPIE  -fuse-ld=gold  -O2 -ffast-math -mfpmath=sse -march=slm 
-mllvm -print-after-all  fft00.ll


Okunev Sergey,
Software Engineer
Intel Compiler Team

-- 
You are receiving this mail because:
You are on the CC list for the bug.

_______________________________________________
llvm-bugs mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 24373] New: Performance degradation of ‘fft’ test from eembc.1.1 suite on x86 Avoton-1.7 due to [DAGCombine]-shift changes

Reply via email to