On Thu, 11 Oct 2018, Richard Biener wrote: > > The following fixes vec_construct cost calculation to properly consider > that the inserts will happen to SSE regs thus forgo the multiplication > done in ix86_vec_cost which is passed the wrong mode. This gets rid of > the only call passing false to ix86_vec_cost (so consider the patch > amended to remove the arg if approved). > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > OK for trunk?
PING. > I am considering to make the factor we apply in ix86_vec_cost > which currently depends on X86_TUNE_AVX128_OPTIMAL and > X86_TUNE_SSE_SPLIT_REGS part of the actual cost tables since > the reason we apply them are underlying CPU architecture details. > Was the original reason of doing the multiplication based on > those tunings to be able to "share" the same basic cost table > across architectures that differ in this important detail? > I see X86_TUNE_SSE_SPLIT_REGS is only used for m_ATHLON_K8 > and X86_TUNE_AVX128_OPTIMAL is used for m_BDVER, m_BTVER2 > and m_ZNVER1. Those all have (multiple) exclusive processor_cost_table > entries. > > As a first step I'd like to remove the use of ix86_vec_cost for > the entries that already have entries for multiple modes > (loads and stores) and apply the factor there. For example > Zen can do two 128bit loads per cycle but only one 128bit store. > With multiplying AVX256 costs by two we seem to cost sth like > # instructions to dispatch * instruction latency which is an > odd thing. I'd have expected # instructions to dispatch / instruction > throughput * instruction latency - so a AVX256 add would cost > the same as a AVX128 add, likewise for loads but stores would be > more expensive because of the throughput issue. This all > ignores resource utilization across multiple insns but that's > how the cost model works ... > > Thanks, > Richard. > > 2018-10-11 Richard Biener <rguent...@suse.de> > > * config/i386/i386.c (ix86_vec_cost): Remove !parallel path > and argument. > (ix86_builtin_vectorization_cost): For vec_construct properly > cost insertion into SSE regs. > (...): Adjust calls to ix86_vec_cost. > > Index: gcc/config/i386/i386.c > =================================================================== > --- gcc/config/i386/i386.c (revision 265022) > +++ gcc/config/i386/i386.c (working copy) > @@ -39846,11 +39846,10 @@ ix86_set_reg_reg_cost (machine_mode mode > static int > ix86_vec_cost (machine_mode mode, int cost, bool parallel) > { > + gcc_assert (parallel); > if (!VECTOR_MODE_P (mode)) > return cost; > - > - if (!parallel) > - return cost * GET_MODE_NUNITS (mode); > + > if (GET_MODE_BITSIZE (mode) == 128 > && TARGET_SSE_SPLIT_REGS) > return cost * 2; > @@ -45190,8 +45189,9 @@ ix86_builtin_vectorization_cost (enum ve > > case vec_construct: > { > - /* N element inserts. */ > - int cost = ix86_vec_cost (mode, ix86_cost->sse_op, false); > + gcc_assert (VECTOR_MODE_P (mode)); > + /* N element inserts into SSE vectors. */ > + int cost = GET_MODE_NUNITS (mode) * ix86_cost->sse_op; > /* One vinserti128 for combining two SSE vectors for AVX256. */ > if (GET_MODE_BITSIZE (mode) == 256) > cost += ix86_vec_cost (mode, ix86_cost->addss, true); > > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)