https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88533
Bug ID: 88533
Summary: [9 Regression] Higher performance penalty of
array-bounds checking for sparse-matrix vector
multiply
Product: gcc
Version: 9.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: fortran
Assignee: unassigned at gcc dot gnu.org
Reporter: anlauf at gmx dot de
Target Milestone: ---
Created attachment 45249
--> https://gcc.gnu.org/bugzilla/attachment.cgi?id=45249&action=edit
Fortran code
I am seeing an increased performance penalty due to array-bounds checking,
in particular for sparse-matrix (CSC) vector multiplication.
The attached, semi-reduced test case, which only needs the provided meta-data
but otherwise uses random elements, should be sufficient for demonstration.
I have tested on an i5-8250U and tuned the "outer loop" so that the testcase
runs in 1-2 seconds on that machine. For that purpose, I have used some
feedback provided to my initial posting on gcc-help, see
https://gcc.gnu.org/ml/gcc-help/2018-12/msg00041.html
Tested compilers:
gcc-7.3.1 20180323 [gcc-7-branch revision 258812]
gcc-8.2.1 20181202
gcc-9.0.0 20181214
baseline options: -O2 -ftree-vectorize -g -march=skylake -mfpmath=sse
7: 1.12
8: 1.12
9: 1.12
baseline + -funroll-loops :
7: 1.00
8: 1.00
9: 0.99
baseline + -funroll-loops -fcheck=bounds :
7: 1.56
8: 1.56
9: 1.93
baseline + -funroll-loops -fcheck=bounds -fno-tree-ch :
7: 1.78
8: 1.80
9: 1.83
baseline + -funroll-loops -fno-tree-ch :
7: 1.05
8: 1.09
9: 1.09
Preliminary conclusions:
- -funroll-loops is helpful here
- -fcheck=bounds is quite expensive with current 9.0
- -fno-tree-ch brings the different versions in line,
it benefits 9, but is worse for 7 and 8
- there a no options above that bring 9 to the level of 7 and 8
as long as bounds-checking is desired.