Re: New parameters to control stringop expansion libcall strategy

Xinliang David Li Tue, 06 Aug 2013 09:43:39 -0700

Corrected two small problems reported by the style checker (The
warnings about the EnumValue for options  in stringopt.opt are not
valid).


On Tue, Aug 6, 2013 at 1:46 AM, Michael Zolotukhin
<michael.v.zolotuk...@gmail.com> wrote:
> There are still some formatting issues (like 8 spaces instead of a
> tab, wrong indentation of do-loop and some other places) - to reveal
> some of them you could use contrib/check_GNU_style.sh script.
> But that was a nitpicking again:) Actually I wanted to ask whether
> you're going to use this option for some performance experiments
> involving memmov/memset - if so, probably you could tune existing
> cost-models as well? Is it possible?

the option is designed for purpose like this.

thanks,

David

>
> Michael
>
> On 5 August 2013 20:44, Xinliang David Li <davi...@google.com> wrote:
>> thanks. Updated patch attached.
>>
>> David
>>
>> On Mon, Aug 5, 2013 at 3:57 AM, Michael V. Zolotukhin
>> <michael.v.zolotuk...@gmail.com> wrote:
>>> Hi,
>>> This is a really convenient option, thanks for working on it.
>>> I can't approve it as I'm not a maintainer, but it looks ok to me,
>>> except fot a small nitpicking: afair, comments should end with
>>> dot-space-space.
>>>
>>> Michael
>>>
>>> On 04 Aug 20:01, Xinliang David Li wrote:
>>>> The attached is a new patch implementing the stringop inline strategy
>>>> control using two new -m options:
>>>>
>>>> -mmemcpy-strategy=
>>>> -mmemset-strategy=
>>>>
>>>> See changes in doc/invoke.texi for description of the new options. Example:
>>>>   
>>>> -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned
>>>>
>>>> tells compiler to inline memcpy using rep_8byte when the size is no
>>>> larger than 64 byte, using unrolled_loop when size is no larger than
>>>> 2048, and for size > 2048, using library call. In all cases,
>>>> destination alignment adjustment is not done.
>>>>
>>>> Tested on x86-64/linux. Ok for trunk?
>>>>
>>>> thanks,
>>>>
>>>> David
>>>>
>>>> 2013-08-02  Xinliang David Li  <davi...@google.com>
>>>>
>>>>         * config/i386/stringop.def: New file.
>>>>         * config/i386/stringop.opt: New file.
>>>>         * config/i386/i386-opts.h: Include stringopt.def.
>>>>         * config/i386/i386.opt: Include stringopt.opt.
>>>>         * config/i386/i386.c (ix86_option_override_internal):
>>>>         Override default size based stringop inline strategies
>>>>         with options.
>>>>         * config/i386/i386.c (ix86_parse_stringop_strategy_string):
>>>>         New function.
>>>>
>>>> 2013-08-04  Xinliang David Li  <davi...@google.com>
>>>>
>>>>         * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test.
>>>>         * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto.
>>>>         * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto.
>>>>         * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto.
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davi...@google.com> 
>>>> wrote:
>>>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with
>>>> > FDO), libcall strategy is used with the size is > 8192. This value is
>>>> > hard coded, which makes it hard to do performance tuning. This patch
>>>> > adds two new parameters to do that. Potential usage includes
>>>> > per-application libcall strategy min-size tuning based on summary data
>>>> > with FDO (e.g, instruction workset size).
>>>> >
>>>> > Bootstrap and tested on x86_64/linux. Ok for trunk?
>>>> >
>>>> > thanks,
>>>> >
>>>> > David
>>>> >
>>>> >
>>>> > 2013-08-02  Xinliang David Li  <davi...@google.com>
>>>> >
>>>> >         * params.def: New parameters.
>>>> >         * config/i386/i386.c (ix86_option_override_internal):
>>>> >         Override default libcall size limit with parameters.
>>>
>>>> Index: config/i386/stringop.def
>>>> ===================================================================
>>>> --- config/i386/stringop.def  (revision 0)
>>>> +++ config/i386/stringop.def  (revision 0)
>>>> @@ -0,0 +1,42 @@
>>>> +/* Definitions for option handling for IA-32.
>>>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>>>> +
>>>> +This file is part of GCC.
>>>> +
>>>> +GCC is free software; you can redistribute it and/or modify
>>>> +it under the terms of the GNU General Public License as published by
>>>> +the Free Software Foundation; either version 3, or (at your option)
>>>> +any later version.
>>>> +
>>>> +GCC is distributed in the hope that it will be useful,
>>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> +GNU General Public License for more details.
>>>> +
>>>> +Under Section 7 of GPL version 3, you are granted additional
>>>> +permissions described in the GCC Runtime Library Exception, version
>>>> +3.1, as published by the Free Software Foundation.
>>>> +
>>>> +You should have received a copy of the GNU General Public License and
>>>> +a copy of the GCC Runtime Library Exception along with this program;
>>>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>>> +<http://www.gnu.org/licenses/>.  */
>>>> +
>>>> +DEF_ENUM
>>>> +DEF_ALG (no_stringop, no_stringop)
>>>> +DEF_ENUM
>>>> +DEF_ALG (libcall, libcall)
>>>> +DEF_ENUM
>>>> +DEF_ALG (rep_prefix_1_byte, rep_byte)
>>>> +DEF_ENUM
>>>> +DEF_ALG (rep_prefix_4_byte, rep_4byte)
>>>> +DEF_ENUM
>>>> +DEF_ALG (rep_prefix_8_byte, rep_8byte)
>>>> +DEF_ENUM
>>>> +DEF_ALG (loop_1_byte, byte_loop)
>>>> +DEF_ENUM
>>>> +DEF_ALG (loop, loop)
>>>> +DEF_ENUM
>>>> +DEF_ALG (unrolled_loop, unrolled_loop)
>>>> +DEF_ENUM
>>>> +DEF_ALG (vector_loop, vector_loop)
>>>> Index: config/i386/i386.opt
>>>> ===================================================================
>>>> --- config/i386/i386.opt      (revision 201458)
>>>> +++ config/i386/i386.opt      (working copy)
>>>> @@ -316,6 +316,14 @@ mstack-arg-probe
>>>>  Target Report Mask(STACK_PROBE) Save
>>>>  Enable stack probing
>>>>
>>>> +mmemcpy-strategy=
>>>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy)
>>>> +Specify memcpy expansion strategy when expected size is known
>>>> +
>>>> +mmemset-strategy=
>>>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy)
>>>> +Specify memset expansion strategy when expected size is known
>>>> +
>>>>  mstringop-strategy=
>>>>  Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) 
>>>> Init(no_stringop)
>>>>  Chose strategy to generate stringop using
>>>> Index: config/i386/stringop.opt
>>>> ===================================================================
>>>> --- config/i386/stringop.opt  (revision 0)
>>>> +++ config/i386/stringop.opt  (revision 0)
>>>> @@ -0,0 +1,36 @@
>>>> +/* Definitions for option handling for IA-32.
>>>> +   Copyright (C) 2013 Free Software Foundation, Inc.
>>>> +
>>>> +This file is part of GCC.
>>>> +
>>>> +GCC is free software; you can redistribute it and/or modify
>>>> +it under the terms of the GNU General Public License as published by
>>>> +the Free Software Foundation; either version 3, or (at your option)
>>>> +any later version.
>>>> +
>>>> +GCC is distributed in the hope that it will be useful,
>>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of
>>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>>>> +GNU General Public License for more details.
>>>> +
>>>> +Under Section 7 of GPL version 3, you are granted additional
>>>> +permissions described in the GCC Runtime Library Exception, version
>>>> +3.1, as published by the Free Software Foundation.
>>>> +
>>>> +You should have received a copy of the GNU General Public License and
>>>> +a copy of the GCC Runtime Library Exception along with this program;
>>>> +see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>>> +<http://www.gnu.org/licenses/>.  */
>>>> +
>>>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte)
>>>> +
>>>> +#undef DEF_ENUM
>>>> +#define DEF_ENUM EnumValue
>>>> +
>>>> +#undef DEF_ALG
>>>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg)
>>>> +
>>>> +#include "stringop.def"
>>>> +
>>>> +#undef DEF_ENUM
>>>> +#undef DEF_ALG
>>>> Index: config/i386/i386.c
>>>> ===================================================================
>>>> --- config/i386/i386.c        (revision 201458)
>>>> +++ config/i386/i386.c        (working copy)
>>>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost =
>>>>  };
>>>>
>>>>  /* Processor costs (relative to an add) */
>>>> -static const
>>>> +static
>>>>  struct processor_costs i386_cost = { /* 386 specific costs */
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = {      /*
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs i486_cost = { /* 486 specific costs */
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = {      /*
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs pentium_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs pentiumpro_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost =
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs geode_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs k6_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs athlon_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs k8_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (2),                 /* cost of a lea instruction */
>>>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs pentium4_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (3),                 /* cost of a lea instruction */
>>>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs nocona_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1),                 /* cost of a lea instruction */
>>>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = {
>>>>    1,                                 /* cond_not_taken_branch_cost.  */
>>>>  };
>>>>
>>>> -static const
>>>> +static
>>>>  struct processor_costs atom_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>>>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = {
>>>>  };
>>>>
>>>>  /* Generic64 should produce code tuned for Nocona and K8.  */
>>>> -static const
>>>> +static
>>>>  struct processor_costs generic64_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>>>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost =
>>>>  };
>>>>
>>>>  /* core_cost should produce code tuned for Core familly of CPUs.  */
>>>> -static const
>>>> +static
>>>>  struct processor_costs core_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    /* On all chips taken into consideration lea is 2 cycles and more.  With
>>>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = {
>>>>
>>>>  /* Generic32 should produce code tuned for PPro, Pentium4, Nocona,
>>>>     Athlon and K8.  */
>>>> -static const
>>>> +static
>>>>  struct processor_costs generic32_cost = {
>>>>    COSTS_N_INSNS (1),                 /* cost of an add instruction */
>>>>    COSTS_N_INSNS (1) + 1,             /* cost of a lea instruction */
>>>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void)
>>>>
>>>>    return;
>>>>  }
>>>> +
>>>> +static const char *stringop_alg_names[] = {
>>>> +#define DEF_ENUM
>>>> +#define DEF_ALG(alg, name) #name,
>>>> +#include "stringop.def"
>>>> +#undef DEF_ENUM
>>>> +#undef DEF_ALG
>>>> +};
>>>> +
>>>> +/* Parse parameter string passed to -mmemcpy-strategy= or 
>>>> -mmemset-strategy=.
>>>> +   The string is of the following form (or comma separated list of it):
>>>> +
>>>> +     strategy_alg:max_size:[align|noalign]
>>>> +
>>>> +   where the full size range for the strategy is either [0, max_size] or
>>>> +   [min_size, max_size], in which min_size is the max_size + 1 of the
>>>> +   preceding range.  The last size range must have max_size == -1.
>>>> +
>>>> +   Examples:
>>>> +
>>>> +    1.
>>>> +       -mmemcpy-strategy=libcall:-1:noalign
>>>> +
>>>> +      this is equivalent to (for known size memcpy) 
>>>> -mstringop-strategy=libcall
>>>> +
>>>> +
>>>> +   2.
>>>> +      
>>>> -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign
>>>> +
>>>> +      This is to tell the compiler to use the following strategy for 
>>>> memset
>>>> +      1) when the expected size is between [1, 16], use rep_8byte 
>>>> strategy;
>>>> +      2) when the size is between [17, 2048], use vector_loop;
>>>> +      3) when the size is > 2048, use libcall.
>>>> +
>>>> +*/
>>>> +
>>>> +struct stringop_size_range
>>>> +{
>>>> +  int min;
>>>> +  int max;
>>>> +  stringop_alg alg;
>>>> +  bool noalign;
>>>> +};
>>>> +
>>>> +static void
>>>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset)
>>>> +{
>>>> +  const struct stringop_algs *default_algs;
>>>> +  stringop_size_range input_ranges[MAX_STRINGOP_ALGS];
>>>> +  char *curr_range_str, *next_range_str;
>>>> +  int i = 0, n = 0;
>>>> +
>>>> +  if (is_memset)
>>>> +    default_algs = &ix86_cost->memset[TARGET_64BIT != 0];
>>>> +  else
>>>> +    default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0];
>>>> +
>>>> +  curr_range_str = strategy_str;
>>>> +
>>>> +  do {
>>>> +
>>>> +    int mins, maxs;
>>>> +    stringop_alg alg;
>>>> +    char alg_name[128];
>>>> +    char align[16];
>>>> +
>>>> +    next_range_str = strchr (curr_range_str, ',');
>>>> +    if (next_range_str)
>>>> +      *next_range_str++ = '\0';
>>>> +
>>>> +    if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, 
>>>> align))
>>>> +      {
>>>> +        warning (0, "Wrong arg %s to option %s", curr_range_str,
>>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +        return;
>>>> +      }
>>>> +
>>>> +    if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != 
>>>> -1))
>>>> +      {
>>>> +        warning (0, "Size ranges of option %s should be increasing",
>>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +        return;
>>>> +      }
>>>> +
>>>> +    for (i = 0; i < last_alg; i++)
>>>> +      {
>>>> +        if (!strcmp (alg_name, stringop_alg_names[i]))
>>>> +       {
>>>> +         alg = (stringop_alg) i;
>>>> +         break;
>>>> +          }
>>>> +      }
>>>> +
>>>> +    if (i == last_alg)
>>>> +      {
>>>> +        warning (0, "Wrong stringop strategy name %s specified for option 
>>>> %s",
>>>> +              alg_name,
>>>> +                 is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +     return;
>>>> +      }
>>>> +
>>>> +    input_ranges[n].min = mins;
>>>> +    input_ranges[n].max = maxs;
>>>> +    input_ranges[n].alg = alg;
>>>> +    if (!strcmp (align, "align"))
>>>> +      input_ranges[n].noalign = false;
>>>> +    else if (!strcmp (align, "noalign"))
>>>> +      input_ranges[n].noalign = true;
>>>> +    else
>>>> +      {
>>>> +        warning (0, "Unknown alignment %s specified for option %s",
>>>> +                 align, is_memset ? "-mmemset_strategy=" : 
>>>> "-mmemcpy_strategy=");
>>>> +        return;
>>>> +      }
>>>> +    n++;
>>>> +    curr_range_str = next_range_str;
>>>> +  } while (curr_range_str);
>>>> +
>>>> +  if (input_ranges[n - 1].max != -1)
>>>> +    {
>>>> +      warning (0, "The max value for the last size range should be -1"
>>>> +               " for option %s",
>>>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +      return;
>>>> +    }
>>>> +
>>>> +  if (n > MAX_STRINGOP_ALGS)
>>>> +    {
>>>> +      warning (0, "Too many size ranges specified in option %s",
>>>> +               is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy=");
>>>> +      return;
>>>> +    }
>>>> +
>>>> +  /* Now override the default algs array  */
>>>> +  for (i = 0; i < n; i++)
>>>> +    {
>>>> +      *const_cast<int *>(&default_algs->size[i].max) = 
>>>> input_ranges[i].max;
>>>> +      *const_cast<stringop_alg *>(&default_algs->size[i].alg)
>>>> +          = input_ranges[i].alg;
>>>> +      *const_cast<int *>(&default_algs->size[i].noalign)
>>>> +          = input_ranges[i].noalign;
>>>> +    }
>>>> +}
>>>> +
>>>>
>>>>  /* Override various settings based on options.  If MAIN_ARGS_P, the
>>>>     options are from the command line, otherwise they are from
>>>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main
>>>>    /* Handle stack protector */
>>>>    if (!global_options_set.x_ix86_stack_protector_guard)
>>>>      ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS;
>>>> +
>>>> +  /* Handle -mmemcpy-strategy= and -mmemset-strategy=  */
>>>> +  if (ix86_tune_memcpy_strategy)
>>>> +    {
>>>> +      char *str = xstrdup (ix86_tune_memcpy_strategy);
>>>> +      ix86_parse_stringop_strategy_string (str, false);
>>>> +      free (str);
>>>> +    }
>>>> +
>>>> +  if (ix86_tune_memset_strategy)
>>>> +    {
>>>> +      char *str = xstrdup (ix86_tune_memset_strategy);
>>>> +      ix86_parse_stringop_strategy_string (str, true);
>>>> +      free (str);
>>>> +    }
>>>>  }
>>>>
>>>>  /* Implement the TARGET_OPTION_OVERRIDE hook.  */
>>>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>>      {
>>>>      case libcall:
>>>>      case no_stringop:
>>>> +    case last_alg:
>>>>        gcc_unreachable ();
>>>>      case loop_1_byte:
>>>>        need_zero_guard = true;
>>>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt
>>>>      {
>>>>      case libcall:
>>>>      case no_stringop:
>>>> +    case last_alg:
>>>>        gcc_unreachable ();
>>>>      case loop_1_byte:
>>>>      case loop:
>>>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>>      {
>>>>      case libcall:
>>>>      case no_stringop:
>>>> +    case last_alg:
>>>>        gcc_unreachable ();
>>>>      case loop:
>>>>        need_zero_guard = true;
>>>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e
>>>>      {
>>>>      case libcall:
>>>>      case no_stringop:
>>>> +    case last_alg:
>>>>        gcc_unreachable ();
>>>>      case loop_1_byte:
>>>>      case loop:
>>>> Index: config/i386/i386-opts.h
>>>> ===================================================================
>>>> --- config/i386/i386-opts.h   (revision 201458)
>>>> +++ config/i386/i386-opts.h   (working copy)
>>>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI
>>>>  /* Algorithm to expand string function with.  */
>>>>  enum stringop_alg
>>>>  {
>>>> -   no_stringop,
>>>> -   libcall,
>>>> -   rep_prefix_1_byte,
>>>> -   rep_prefix_4_byte,
>>>> -   rep_prefix_8_byte,
>>>> -   loop_1_byte,
>>>> -   loop,
>>>> -   unrolled_loop,
>>>> -   vector_loop
>>>> +#undef DEF_ENUM
>>>> +#define DEF_ENUM
>>>> +
>>>> +#undef DEF_ALG
>>>> +#define DEF_ALG(alg, name) alg,
>>>> +
>>>> +#include "stringop.def"
>>>> +last_alg
>>>> +
>>>> +#undef DEF_ENUM
>>>> +#undef DEF_ALG
>>>>  };
>>>>
>>>>  /* Available call abi.  */
>>>> Index: doc/invoke.texi
>>>> ===================================================================
>>>> --- doc/invoke.texi   (revision 201458)
>>>> +++ doc/invoke.texi   (working copy)
>>>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}.
>>>>  -mbmi2 -mrtm -mlwp -mthreads @gol
>>>>  -mno-align-stringops  -minline-all-stringops @gol
>>>>  -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol
>>>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy}
>>>>  -mpush-args  -maccumulate-outgoing-args  -m128bit-long-double @gol
>>>>  -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol
>>>>  -mregparm=@var{num}  -msseregparm @gol
>>>> @@ -14598,6 +14599,24 @@ Expand into an inline loop.
>>>>  Always use a library call.
>>>>  @end table
>>>>
>>>> +@item -mmemcpy-strategy=@var{strategy}
>>>> +@opindex mmemcpy-strategy=@var{strategy}
>>>> +Override the internal decision heuristic to decide if 
>>>> @code{__builtin_memcpy}
>>>> +should be inlined and what inline algorithm to use when the expected size
>>>> +of the copy operation is known. @var{strategy}
>>>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} 
>>>> triplets.
>>>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} 
>>>> specifies
>>>> +the max byte size with which inline algorithm @var{alg} is allowed. For 
>>>> the last
>>>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the 
>>>> triplets
>>>> +in the list must be specified in increasing order. The minimal byte size 
>>>> for
>>>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} 
>>>> of the
>>>> +preceding range.
>>>> +
>>>> +@item -mmemset-strategy=@var{strategy}
>>>> +@opindex mmemset-strategy=@var{strategy}
>>>> +The option is similar to @option{-mmemcpy-strategy=} except that it is to 
>>>> control
>>>> +@code{__builtin_memset} expansion.
>>>> +
>>>>  @item -momit-leaf-frame-pointer
>>>>  @opindex momit-leaf-frame-pointer
>>>>  Don't keep the frame pointer in a register for leaf functions.  This
>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c
>>>> ===================================================================
>>>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c     (revision 0)
>>>> @@ -0,0 +1,12 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" 
>>>> } */
>>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } 
>>>> } } */
>>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>>> +
>>>> +char a[2048];
>>>> +char b[2048];
>>>> +void t (void)
>>>> +{
>>>> +  __builtin_memcpy (a, b, 2048);
>>>> +}
>>>> +
>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c
>>>> ===================================================================
>>>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c     (revision 0)
>>>> @@ -0,0 +1,12 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -march=atom 
>>>> -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */
>>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } 
>>>> } } */
>>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */
>>>> +
>>>> +char a[2048];
>>>> +char b[2048];
>>>> +void t (void)
>>>> +{
>>>> +  __builtin_memcpy (a, b, 2048);
>>>> +}
>>>> +
>>>> Index: testsuite/gcc.target/i386/memset-strategy-1.c
>>>> ===================================================================
>>>> --- testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>>>> +++ testsuite/gcc.target/i386/memset-strategy-1.c     (revision 0)
>>>> @@ -0,0 +1,10 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */
>>>> +/* { dg-final { scan-assembler-times "memset" 2  } } */
>>>> +
>>>> +char a[2048];
>>>> +void t (void)
>>>> +{
>>>> +  __builtin_memset (a, 1, 2048);
>>>> +}
>>>> +
>>>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c
>>>> ===================================================================
>>>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>>>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c     (revision 0)
>>>> @@ -0,0 +1,11 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -march=atom 
>>>> -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */
>>>> +/* { dg-final { scan-assembler-times "memcpy" 2  } } */
>>>> +
>>>> +char a[2048];
>>>> +char b[2048];
>>>> +void t (void)
>>>> +{
>>>> +  __builtin_memcpy (a, b, 2048);
>>>> +}
>>>> +
>>>
>
>
>
> --
> ---
> Best regards,
> Michael V. Zolotukhin,
> Software Engineer
> Intel Corporation.

Re: New parameters to control stringop expansion libcall strategy

Reply via email to