https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91446

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
Not sure what you are after, we are costing for the scalar cost

0x4ad73f0 width_2(D) 1 times scalar_store costs 6 in body
0x4ad73f0 height_4(D) 1 times scalar_store costs 6 in body
0x4ad73f0 x_6(D) 1 times scalar_store costs 6 in body
0x4ad73f0 y_8(D) 1 times scalar_store costs 6 in body

and for the vectorized cost

0x4bd8330 width_2(D) 1 times vec_construct costs 8 in prologue
0x4bd8330 width_2(D) 1 times vector_store costs 16 in body
0x4bd8330 x_6(D) 1 times vec_construct costs 8 in prologue
0x4bd8330 x_6(D) 1 times vector_store costs 16 in body

t.i:17:3: note:  Cost model analysis:
  Vector inside of basic block cost: 32
  Vector prologue cost: 16
  Vector epilogue cost: 0
  Scalar cost of basic block: 24
t.i:17:3: missed:  not vectorized: vectorization is not profitable.

I assume skylake-avx512 uses skylake_cost.  The only issue I see is fixed
use of [0]/[2] (SImode/SFmode) also because the cost tables do not have
entries for DImode int_load/store.

Skylake costs are odd here:

  {4, 4, 4},                            /* cost of loading integer registers
                                           in QImode, HImode and SImode.
                                           Relative to reg-reg move (2).  */
  {6, 6, 3},                            /* cost of storing integer registers */

Why is SImode store cost 3?  (looks like "benchmark" random-number generator?)

 {6, 6, 6, 10, 20},                    /* cost of loading SSE registers
                                           in 32,64,128,256 and 512-bit */
  {6, 6, 6, 10, 20},                    /* cost of unaligned loads.  */
  {8, 8, 8, 12, 24},                    /* cost of storing SSE registers
                                           in 32,64,128,256 and 512-bit */
  {8, 8, 8, 8, 16},                     /* cost of unaligned stores.  */

again, unaligned SSE stores are cheaper than aligned ones for 256 and 512
bits?!

In the end the scalar code is not vectorized because of vector construction
cost and because the vector store cost is higher than the scalar store cost
which oddly is too cheap (3 vs. expected 6).

So - INVALID?

Reply via email to