On Thu, 11 Oct 2018, Richard Biener wrote:

> 
> The following fixes vec_construct cost calculation to properly consider
> that the inserts will happen to SSE regs thus forgo the multiplication
> done in ix86_vec_cost which is passed the wrong mode.  This gets rid of
> the only call passing false to ix86_vec_cost (so consider the patch
> amended to remove the arg if approved).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK for trunk?

PING.

> I am considering to make the factor we apply in ix86_vec_cost
> which currently depends on X86_TUNE_AVX128_OPTIMAL and
> X86_TUNE_SSE_SPLIT_REGS part of the actual cost tables since
> the reason we apply them are underlying CPU architecture details.
> Was the original reason of doing the multiplication based on
> those tunings to be able to "share" the same basic cost table
> across architectures that differ in this important detail?
> I see X86_TUNE_SSE_SPLIT_REGS is only used for m_ATHLON_K8
> and X86_TUNE_AVX128_OPTIMAL is used for m_BDVER, m_BTVER2
> and m_ZNVER1.  Those all have (multiple) exclusive processor_cost_table
> entries.
> 
> As a first step I'd like to remove the use of ix86_vec_cost for
> the entries that already have entries for multiple modes
> (loads and stores) and apply the factor there.  For example
> Zen can do two 128bit loads per cycle but only one 128bit store.
> With multiplying AVX256 costs by two we seem to cost sth like
> # instructions to dispatch * instruction latency which is an
> odd thing.  I'd have expected # instructions to dispatch / instruction 
> throughput * instruction latency - so a AVX256 add would cost
> the same as a AVX128 add, likewise for loads but stores would be
> more expensive because of the throughput issue.  This all
> ignores resource utilization across multiple insns but that's
> how the cost model works ...
> 
> Thanks,
> Richard.
> 
> 2018-10-11  Richard Biener  <rguent...@suse.de>
> 
>       * config/i386/i386.c (ix86_vec_cost): Remove !parallel path
>       and argument.
>       (ix86_builtin_vectorization_cost): For vec_construct properly
>       cost insertion into SSE regs.
>       (...): Adjust calls to ix86_vec_cost.
> 
> Index: gcc/config/i386/i386.c
> ===================================================================
> --- gcc/config/i386/i386.c    (revision 265022)
> +++ gcc/config/i386/i386.c    (working copy)
> @@ -39846,11 +39846,10 @@ ix86_set_reg_reg_cost (machine_mode mode
>  static int
>  ix86_vec_cost (machine_mode mode, int cost, bool parallel)
>  {
> +  gcc_assert (parallel);
>    if (!VECTOR_MODE_P (mode))
>      return cost;
> - 
> -  if (!parallel)
> -    return cost * GET_MODE_NUNITS (mode);
> +
>    if (GET_MODE_BITSIZE (mode) == 128
>        && TARGET_SSE_SPLIT_REGS)
>      return cost * 2;
> @@ -45190,8 +45189,9 @@ ix86_builtin_vectorization_cost (enum ve
>  
>        case vec_construct:
>       {
> -       /* N element inserts.  */
> -       int cost = ix86_vec_cost (mode, ix86_cost->sse_op, false);
> +       gcc_assert (VECTOR_MODE_P (mode));
> +       /* N element inserts into SSE vectors.  */
> +       int cost = GET_MODE_NUNITS (mode) * ix86_cost->sse_op;
>         /* One vinserti128 for combining two SSE vectors for AVX256.  */
>         if (GET_MODE_BITSIZE (mode) == 256)
>           cost += ix86_vec_cost (mode, ix86_cost->addss, true);
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Reply via email to