Re: [PATCH v2] Consider frequency in cost estimation when converting scalar to vector.

Jan Hubicka Tue, 29 Apr 2025 13:07:02 -0700

> > I am generally trying to get rid of remaing uses of REG_FREQ since the
> > 10000 based fixed point arithmetics iot always working that well.
> >
> > You can do the sums in profile_count type (doing something reasonable
> > when count is uninitialized) and then convert it to sreal for the final
> > heuristics.
> 
> Changed, when count_max is initialized and function is not optimized to size,
> use
>       bb_count.initialized_p
>       ? bb_count.probability_in (count_max).to_sreal ()
>       : 0.1;
> 
> Otherwise, just use 1.
> 
> 
> n some benchmark, I notice stv failed due to cost unprofitable, but the igain
> is inside the loop, but sse<->integer conversion is outside the loop, current 
> cost
> model doesn't consider the frequency of those gain/cost.
> The patch weights those cost with frequency.
> 
> The patch regressed gcc.target/i386/minmax-6.c under -m32 Since the
> place of integer<->sse is before the branch, and the conversion to
> min/max is in the branch, with static profile, the cost model is not
> profitable anymore which is exactly the patch try to do.
> Considering the original testcase is to guard RA issue, so restrict
> the testcase under ! ia32 should still be ok.
> 
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
> 
> gcc/ChangeLog:
> 
>       * config/i386/i386-features.cc (get_bb_probability): New function.
>       (scalar_chain::mark_dual_mode_def): Weight
>       n_integer_to_sse/n_sse_to_integer with bb frequency.
>       (general_scalar_chain::compute_convert_gain): Ditto, and
>       adjust function prototype to return true/false when cost model
>       is profitable or not.
>       (timode_scalar_chain::compute_convert_gain): Ditto.
>       (convert_scalars_to_vector): Adjust after the upper two
>       function prototype are changed.
>       * config/i386/i386-features.h (class scalar_chain): Change
>       n_integer_to_sse/n_sse_to_integer from int to sreal.
>       (class general_scalar_chain): Adjust prototype to return bool
>       intead of int.
>       (class timode_scalar_chain): Ditto.
> 
> gcc/testsuite/ChangeLog:
>       * gcc.target/i386/minmax-6.c: Adjust testcase.
> ---
>  gcc/config/i386/i386-features.cc         | 83 +++++++++++++++++-------
>  gcc/config/i386/i386-features.h          | 10 +--
>  gcc/testsuite/gcc.target/i386/minmax-6.c |  2 +-
>  3 files changed, 65 insertions(+), 30 deletions(-)
> 
> diff --git a/gcc/config/i386/i386-features.cc 
> b/gcc/config/i386/i386-features.cc
> index c35ac24fd8a..57dcd94c211 100644
> --- a/gcc/config/i386/i386-features.cc
> +++ b/gcc/config/i386/i386-features.cc
> @@ -326,6 +326,30 @@ scalar_chain::add_to_queue (unsigned insn_uid)
>            insn_uid, chain_id);
>  }
>  
> +/* Return BB probablity in cfun->cfg->count_max,
> +   if count_max is uninitialized or optimized_for_size, return 1.
> +   else if bb count is uninitialized, return 0.1 as a rough estimation.
> +   else return bb_count.probablity_in (count_max).  */
> +static sreal
> +get_bb_probability (basic_block bb)


I think you want relative bb frequency that you can get by:

  sreal_freq = bb->count.to_sreal_scale (ENTRY_BLOCK_PTR_FOR_FN (fun)->count)

The value returned will be 1 if BB is executed once per invocation of
functions and >1 if it is in loop etc.
Also if BB count is uninitialized it will return 1 that is probably
reasonable deafult.
> +{
> +  profile_count bb_count = bb->count;
> +  profile_count count_max = cfun->cfg->count_max;
> +  sreal sreal_count = 1;
> +  if (count_max.initialized_p ()
> +      && !optimize_function_for_size_p (cfun))
> +    {
> +      if (bb_count.initialized_p ())
> +     sreal_count = bb_count.probability_in (count_max).to_sreal ();
> +      else
> +     /* For unintialized BB, weight it with 0.1,
> +        ??? just like REG_FREQ_MAX / BB_FREQ_MAX.   */
> +     sreal_count = (sreal) 1 / (sreal) 10;
> +    }
> +
> +  return sreal_count;
> +}
> +
>  /* For DImode conversion, mark register defined by DEF as requiring
>     conversion.  */
>  
> @@ -337,18 +361,20 @@ scalar_chain::mark_dual_mode_def (df_ref def)
>    /* Record the def/insn pair so we can later efficiently iterate over
>       the defs to convert on insns not in the chain.  */
>    bool reg_new = bitmap_set_bit (defs_conv, DF_REF_REGNO (def));
> +  sreal sreal_count = get_bb_probability (BLOCK_FOR_INSN (DF_REF_INSN 
> (def)));
> +
>    if (!bitmap_bit_p (insns, DF_REF_INSN_UID (def)))
>      {
>        if (!bitmap_set_bit (insns_conv, DF_REF_INSN_UID (def))
>         && !reg_new)
>       return;
> -      n_integer_to_sse++;
> +      n_integer_to_sse += sreal_count;
>      }
>    else
>      {
>        if (!reg_new)
>       return;
> -      n_sse_to_integer++;
> +      n_sse_to_integer += sreal_count;

Generaly I am trying to sum counts in profile_count data type and only
later convert them to relative frequencies.  This will work too, but +
of sreal is bit more expensive.
>      }
>  
>    if (dump_file)
> @@ -529,15 +555,15 @@ general_scalar_chain::vector_const_cost (rtx exp)
>    return ix86_cost->sse_load[smode == DImode ? 1 : 0];
>  }
>  
> -/* Compute a gain for chain conversion.  */
> +/* Return true if it's cost profitable for chain conversion.  */
>  
> -int
> +bool
>  general_scalar_chain::compute_convert_gain ()
>  {
>    bitmap_iterator bi;
>    unsigned insn_uid;
> -  int gain = 0;
> -  int cost = 0;
> +  sreal gain = 0;
> +  sreal cost = 0;

so gain is the difference of runtime of integer variant compared to
vector vairant and cost are the extra int->see and sse->int conversions
needed?

If you scale everything by a BB frequency, you will get a weird
behaviour if chain happens to consist only of instructions in BBs with 0
frequency (that commonly happens in FDO).

I think you want to look at the chain first and see if it contains a hot
instruction. If it does, use performance metrics (ix86_cost and scaling
by BB frequencies) if it does not use size metrics (ix86_size_cost and
all scales 1)

Honza
>  
>    if (dump_file)
>      fprintf (dump_file, "Computing gain for chain #%d...\n", chain_id);
> @@ -556,6 +582,7 @@ general_scalar_chain::compute_convert_gain ()
>        rtx src = SET_SRC (def_set);
>        rtx dst = SET_DEST (def_set);
>        int igain = 0;
> +      sreal sreal_count = get_bb_probability (BLOCK_FOR_INSN (insn));

>  
>        if (REG_P (src) && REG_P (dst))
>       igain += 2 * m - ix86_cost->xmm_move;
> @@ -755,16 +782,19 @@ general_scalar_chain::compute_convert_gain ()
>           }
>       }
>  
> +      sreal real_gain = sreal_count * igain;
>        if (igain != 0 && dump_file)
>       {
> -       fprintf (dump_file, "  Instruction gain %d for ", igain);
> +       fprintf (dump_file, "  Instruction gain %.2f for ",
> +                real_gain.to_double ());
>         dump_insn_slim (dump_file, insn);
>       }
> -      gain += igain;
> +      gain += real_gain;
>      }
>  
>    if (dump_file)
> -    fprintf (dump_file, "  Instruction conversion gain: %d\n", gain);
> +    fprintf (dump_file, "  Instruction conversion gain: %.2f\n",
> +          gain.to_double ());
>  
>    /* Cost the integer to sse and sse to integer moves.  */
>    if (!optimize_function_for_size_p (cfun))
> @@ -796,14 +826,14 @@ general_scalar_chain::compute_convert_gain ()
>      }
>  
>    if (dump_file)
> -    fprintf (dump_file, "  Registers conversion cost: %d\n", cost);
> -
> -  gain -= cost;
> +    fprintf (dump_file, "  Registers conversion cost: %.2f\n",
> +          cost.to_double ());
>  
>    if (dump_file)
> -    fprintf (dump_file, "  Total gain: %d\n", gain);
> +    fprintf (dump_file, "  Total gain: %.2f\n, total cost: %.2f",
> +          gain.to_double (), cost.to_double ());
>  
> -  return gain;
> +  return gain > cost ;
>  }
>  
>  /* Insert generated conversion instruction sequence INSNS
> @@ -1520,21 +1550,22 @@ timode_immed_const_gain (rtx cst)
>    return 0;
>  }
>  
> -/* Compute a gain for chain conversion.  */
> +/* Return true it's cost profitable for for chain conversion.  */
>  
> -int
> +bool
>  timode_scalar_chain::compute_convert_gain ()
>  {
>    /* Assume that if we have to move TImode values between units,
>       then transforming this chain isn't worth it.  */
> -  if (n_sse_to_integer || n_integer_to_sse)
> -    return -1;
> +  if (n_sse_to_integer.to_int ()
> +      || n_integer_to_sse.to_int ())
> +    return false;
>  
>    bitmap_iterator bi;
>    unsigned insn_uid;
>  
>    /* Split ties to prefer V1TImode when not optimizing for size.  */
> -  int gain = optimize_size ? 0 : 1;
> +  sreal gain = optimize_size ? 0 : 1;
>  
>    if (dump_file)
>      fprintf (dump_file, "Computing gain for chain #%d...\n", chain_id);
> @@ -1548,6 +1579,7 @@ timode_scalar_chain::compute_convert_gain ()
>        HOST_WIDE_INT op1val;
>        int scost, vcost;
>        int igain = 0;
> +      sreal sreal_count = get_bb_probability (BLOCK_FOR_INSN (insn));
>  
>        switch (GET_CODE (src))
>       {
> @@ -1754,18 +1786,21 @@ timode_scalar_chain::compute_convert_gain ()
>         break;
>       }
>  
> +      sreal real_gain = sreal_count * igain;
>        if (igain != 0 && dump_file)
>       {
> -       fprintf (dump_file, "  Instruction gain %d for ", igain);
> +       
> +       fprintf (dump_file, "  Instruction gain %.2f for ", 
> real_gain.to_double ());
>         dump_insn_slim (dump_file, insn);
>       }
> -      gain += igain;
> +      
> +      gain += real_gain;
>      }
>  
>    if (dump_file)
> -    fprintf (dump_file, "  Total gain: %d\n", gain);
> +    fprintf (dump_file, "  Total gain: %.2f\n", gain.to_double ());
>  
> -  return gain;
> +  return gain > (sreal) 0;
>  }
>  
>  /* Fix uses of converted REG in debug insns.  */
> @@ -2561,7 +2596,7 @@ convert_scalars_to_vector (bool timode_p)
>            conversions.  */
>         if (chain->build (&candidates[i], uid, disallowed))
>           {
> -           if (chain->compute_convert_gain () > 0)
> +           if (chain->compute_convert_gain ())
>               converted_insns += chain->convert ();
>             else if (dump_file)
>               fprintf (dump_file, "Chain #%d conversion is not profitable\n",
> diff --git a/gcc/config/i386/i386-features.h b/gcc/config/i386/i386-features.h
> index 24b0c4ed0cd..cc40c5c9841 100644
> --- a/gcc/config/i386/i386-features.h
> +++ b/gcc/config/i386/i386-features.h
> @@ -153,12 +153,12 @@ class scalar_chain
>  
>    bitmap insns_conv;
>    hash_map<rtx, rtx> defs_map;
> -  unsigned n_sse_to_integer;
> -  unsigned n_integer_to_sse;
> +  sreal n_sse_to_integer;
> +  sreal n_integer_to_sse;
>    auto_vec<rtx_insn *> control_flow_insns;
>  
>    bool build (bitmap candidates, unsigned insn_uid, bitmap disallowed);
> -  virtual int compute_convert_gain () = 0;
> +  virtual bool compute_convert_gain () = 0;
>    int convert ();
>  
>   protected:
> @@ -184,7 +184,7 @@ class general_scalar_chain : public scalar_chain
>   public:
>    general_scalar_chain (enum machine_mode smode_, enum machine_mode vmode_)
>      : scalar_chain (smode_, vmode_) {}
> -  int compute_convert_gain () final override;
> +  bool compute_convert_gain () final override;
>  
>   private:
>    void convert_insn (rtx_insn *insn) final override;
> @@ -196,7 +196,7 @@ class timode_scalar_chain : public scalar_chain
>  {
>   public:
>    timode_scalar_chain () : scalar_chain (TImode, V1TImode) {}
> -  int compute_convert_gain () final override;
> +  bool compute_convert_gain () final override;
>  
>   private:
>    void fix_debug_reg_uses (rtx reg);
> diff --git a/gcc/testsuite/gcc.target/i386/minmax-6.c 
> b/gcc/testsuite/gcc.target/i386/minmax-6.c
> index 615f919ba0a..b5b4363f931 100644
> --- a/gcc/testsuite/gcc.target/i386/minmax-6.c
> +++ b/gcc/testsuite/gcc.target/i386/minmax-6.c
> @@ -15,4 +15,4 @@ UMVLine16Y_11 (short unsigned int * Pic, int y, int width)
>  /* We do not want the RA to spill %esi for it's dual-use but using
>     pmaxsd is OK.  */
>  /* { dg-final { scan-assembler-not "rsp" { target { ! { ia32 } } } } } */
> -/* { dg-final { scan-assembler "pmaxsd" } } */
> +/* { dg-final { scan-assembler "pmaxsd" { target { ! ia32 } } } } */
> -- 
> 2.34.1
>

Re: [PATCH v2] Consider frequency in cost estimation when converting scalar to vector.

Reply via email to