https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94077
Kewen Lin <linkw at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Last reconfirmed| |2020-08-12
Ever confirmed|0 |1
CC| |linkw at gcc dot gnu.org
Status|UNCONFIRMED |ASSIGNED
--- Comment #1 from Kewen Lin <linkw at gcc dot gnu.org> ---
This issue only exists on gcc8 and gcc9, it's gone with gcc10 and trunk.
The main difference is listed below:
with gcc8/gcc9, the cost modeling says it's not profitable because of high cost
realign vector load/store for vectorization body, that is:
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: Cost model analysis:
Vector inside of loop cost: 32
Vector prologue cost: 6
Vector epilogue cost: 0
Scalar iteration cost: 4
Scalar outside cost: 0
Vector outside cost: 6
prologue iterations: 0
epilogue iterations: 0
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: cost model: the vector
iteration cost = 32 divided by the scalar iteration cost = 4 is greater or
equal to the vectorization factor = 4.
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: not vectorized: vectorization
not profitable.
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: not vectorized: vector version
will never be profitable.
While with gcc10 and trunk, the information looks like:
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: Cost model analysis:
Vector inside of loop cost: 6
Vector prologue cost: 0
Vector epilogue cost: 0
Scalar iteration cost: 6
Scalar outside cost: 0
Vector outside cost: 0
prologue iterations: 0
epilogue iterations: 0
Calculated minimum iters for profitability: 0
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: Runtime profitability
threshold = 4
gcc/testsuite/gcc.dg/gomp/pr82374.c:27:3: note: Static estimate
profitability threshold = 4
By tracing back, I noticed the difference comes from:
gcc8/gcc9
can't force alignment of ref: a[i_12]
gcc10/trunk:
force alignment of a[i_12]
I guess it's not a good idea to backport some patch to get the alignment forced
(probably risky?), instead I think we can append an additional option
-mefficient-unaligned-vsx together with -mvsx to ensure we can use unaligned
vector load/store, or set the target requirement into powerpc_vsx_ok &&
vect_hw_misalign, both meet the original testing purpose.
Hi @Jakub, what do you think of this?