https://llvm.org/bugs/show_bug.cgi?id=30939
Bug ID: 30939 Summary: bubble sort test performance is 2 times worse with -unroll-runtime-epilog Product: libraries Version: trunk Hardware: PC OS: Linux Status: NEW Severity: normal Priority: P Component: Loop Optimizer Assignee: unassignedb...@nondot.org Reporter: evstu...@gmail.com CC: llvm-bugs@lists.llvm.org Classification: Unclassified Created attachment 17562 --> https://llvm.org/bugs/attachment.cgi?id=17562&action=edit test compiled with epilog The performance difference between prologue and epilogue unroll is unclear for the case. performance differs 2 times on X86 when SingleSource/Benchmarks/Stanford/Bubblesort.c is compiled with -O2 -march=core-avx2 -mllvm -unroll-runtime-epilog=true (bad case) and -O2 -march=core-avx2 -mllvm -unroll-runtime-epilog=false (good case) Attached assemblies from current compiler: bs_epil.s bs_prol.s and assembly from hottest loop: bs_epil_loop.s bs_prol_loop.s The code looks very similar and with some assembly modifications I was able to make hottest loops identical keeping the same performance gap (2 times). Deeper analysis uncovered that hottest loop (99% of execution time) mostly consist of unpredictable branches stalls: while ( i<top ) { if ( sortlist[i] > sortlist[i+1] ) { j = sortlist[i]; sortlist[i] = sortlist[i+1]; sortlist[i+1] = j; } i=i+1; } sortlist is randomly filled array. That way comparison in the loop is completely unpredictable. The distance between branches is very short. This makes the test very sensitive to code shifts and memory accesses order (as it influence on branch prediction in the loop). See related discussions: https://reviews.llvm.org/D18158 https://reviews.llvm.org/D24593 -- You are receiving this mail because: You are on the CC list for the bug.
_______________________________________________ llvm-bugs mailing list llvm-bugs@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs