Corrected two small problems reported by the style checker (The warnings about the EnumValue for options in stringopt.opt are not valid).
On Tue, Aug 6, 2013 at 1:46 AM, Michael Zolotukhin <michael.v.zolotuk...@gmail.com> wrote: > There are still some formatting issues (like 8 spaces instead of a > tab, wrong indentation of do-loop and some other places) - to reveal > some of them you could use contrib/check_GNU_style.sh script. > But that was a nitpicking again:) Actually I wanted to ask whether > you're going to use this option for some performance experiments > involving memmov/memset - if so, probably you could tune existing > cost-models as well? Is it possible? the option is designed for purpose like this. thanks, David > > Michael > > On 5 August 2013 20:44, Xinliang David Li <davi...@google.com> wrote: >> thanks. Updated patch attached. >> >> David >> >> On Mon, Aug 5, 2013 at 3:57 AM, Michael V. Zolotukhin >> <michael.v.zolotuk...@gmail.com> wrote: >>> Hi, >>> This is a really convenient option, thanks for working on it. >>> I can't approve it as I'm not a maintainer, but it looks ok to me, >>> except fot a small nitpicking: afair, comments should end with >>> dot-space-space. >>> >>> Michael >>> >>> On 04 Aug 20:01, Xinliang David Li wrote: >>>> The attached is a new patch implementing the stringop inline strategy >>>> control using two new -m options: >>>> >>>> -mmemcpy-strategy= >>>> -mmemset-strategy= >>>> >>>> See changes in doc/invoke.texi for description of the new options. Example: >>>> >>>> -mmemcpy-strategy=rep_8byte:64:unaligned,unrolled_loop:2048:unaligned,libcall:-1:unaligned >>>> >>>> tells compiler to inline memcpy using rep_8byte when the size is no >>>> larger than 64 byte, using unrolled_loop when size is no larger than >>>> 2048, and for size > 2048, using library call. In all cases, >>>> destination alignment adjustment is not done. >>>> >>>> Tested on x86-64/linux. Ok for trunk? >>>> >>>> thanks, >>>> >>>> David >>>> >>>> 2013-08-02 Xinliang David Li <davi...@google.com> >>>> >>>> * config/i386/stringop.def: New file. >>>> * config/i386/stringop.opt: New file. >>>> * config/i386/i386-opts.h: Include stringopt.def. >>>> * config/i386/i386.opt: Include stringopt.opt. >>>> * config/i386/i386.c (ix86_option_override_internal): >>>> Override default size based stringop inline strategies >>>> with options. >>>> * config/i386/i386.c (ix86_parse_stringop_strategy_string): >>>> New function. >>>> >>>> 2013-08-04 Xinliang David Li <davi...@google.com> >>>> >>>> * testsuite/gcc.target/i386/memcpy-strategy-1.c: New test. >>>> * testsuite/gcc.target/i386/memcpy-strategy-2.c: Ditto. >>>> * testsuite/gcc.target/i386/memset-strategy-1.c: Ditto. >>>> * testsuite/gcc.target/i386/memcpy-strategy-3.c: Ditto. >>>> >>>> >>>> >>>> >>>> On Fri, Aug 2, 2013 at 9:21 PM, Xinliang David Li <davi...@google.com> >>>> wrote: >>>> > On x86_64, when the expected size of memcpy/memset is known (e.g, with >>>> > FDO), libcall strategy is used with the size is > 8192. This value is >>>> > hard coded, which makes it hard to do performance tuning. This patch >>>> > adds two new parameters to do that. Potential usage includes >>>> > per-application libcall strategy min-size tuning based on summary data >>>> > with FDO (e.g, instruction workset size). >>>> > >>>> > Bootstrap and tested on x86_64/linux. Ok for trunk? >>>> > >>>> > thanks, >>>> > >>>> > David >>>> > >>>> > >>>> > 2013-08-02 Xinliang David Li <davi...@google.com> >>>> > >>>> > * params.def: New parameters. >>>> > * config/i386/i386.c (ix86_option_override_internal): >>>> > Override default libcall size limit with parameters. >>> >>>> Index: config/i386/stringop.def >>>> =================================================================== >>>> --- config/i386/stringop.def (revision 0) >>>> +++ config/i386/stringop.def (revision 0) >>>> @@ -0,0 +1,42 @@ >>>> +/* Definitions for option handling for IA-32. >>>> + Copyright (C) 2013 Free Software Foundation, Inc. >>>> + >>>> +This file is part of GCC. >>>> + >>>> +GCC is free software; you can redistribute it and/or modify >>>> +it under the terms of the GNU General Public License as published by >>>> +the Free Software Foundation; either version 3, or (at your option) >>>> +any later version. >>>> + >>>> +GCC is distributed in the hope that it will be useful, >>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of >>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>>> +GNU General Public License for more details. >>>> + >>>> +Under Section 7 of GPL version 3, you are granted additional >>>> +permissions described in the GCC Runtime Library Exception, version >>>> +3.1, as published by the Free Software Foundation. >>>> + >>>> +You should have received a copy of the GNU General Public License and >>>> +a copy of the GCC Runtime Library Exception along with this program; >>>> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see >>>> +<http://www.gnu.org/licenses/>. */ >>>> + >>>> +DEF_ENUM >>>> +DEF_ALG (no_stringop, no_stringop) >>>> +DEF_ENUM >>>> +DEF_ALG (libcall, libcall) >>>> +DEF_ENUM >>>> +DEF_ALG (rep_prefix_1_byte, rep_byte) >>>> +DEF_ENUM >>>> +DEF_ALG (rep_prefix_4_byte, rep_4byte) >>>> +DEF_ENUM >>>> +DEF_ALG (rep_prefix_8_byte, rep_8byte) >>>> +DEF_ENUM >>>> +DEF_ALG (loop_1_byte, byte_loop) >>>> +DEF_ENUM >>>> +DEF_ALG (loop, loop) >>>> +DEF_ENUM >>>> +DEF_ALG (unrolled_loop, unrolled_loop) >>>> +DEF_ENUM >>>> +DEF_ALG (vector_loop, vector_loop) >>>> Index: config/i386/i386.opt >>>> =================================================================== >>>> --- config/i386/i386.opt (revision 201458) >>>> +++ config/i386/i386.opt (working copy) >>>> @@ -316,6 +316,14 @@ mstack-arg-probe >>>> Target Report Mask(STACK_PROBE) Save >>>> Enable stack probing >>>> >>>> +mmemcpy-strategy= >>>> +Target RejectNegative Joined Var(ix86_tune_memcpy_strategy) >>>> +Specify memcpy expansion strategy when expected size is known >>>> + >>>> +mmemset-strategy= >>>> +Target RejectNegative Joined Var(ix86_tune_memset_strategy) >>>> +Specify memset expansion strategy when expected size is known >>>> + >>>> mstringop-strategy= >>>> Target RejectNegative Joined Enum(stringop_alg) Var(ix86_stringop_alg) >>>> Init(no_stringop) >>>> Chose strategy to generate stringop using >>>> Index: config/i386/stringop.opt >>>> =================================================================== >>>> --- config/i386/stringop.opt (revision 0) >>>> +++ config/i386/stringop.opt (revision 0) >>>> @@ -0,0 +1,36 @@ >>>> +/* Definitions for option handling for IA-32. >>>> + Copyright (C) 2013 Free Software Foundation, Inc. >>>> + >>>> +This file is part of GCC. >>>> + >>>> +GCC is free software; you can redistribute it and/or modify >>>> +it under the terms of the GNU General Public License as published by >>>> +the Free Software Foundation; either version 3, or (at your option) >>>> +any later version. >>>> + >>>> +GCC is distributed in the hope that it will be useful, >>>> +but WITHOUT ANY WARRANTY; without even the implied warranty of >>>> +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >>>> +GNU General Public License for more details. >>>> + >>>> +Under Section 7 of GPL version 3, you are granted additional >>>> +permissions described in the GCC Runtime Library Exception, version >>>> +3.1, as published by the Free Software Foundation. >>>> + >>>> +You should have received a copy of the GNU General Public License and >>>> +a copy of the GCC Runtime Library Exception along with this program; >>>> +see the files COPYING3 and COPYING.RUNTIME respectively. If not, see >>>> +<http://www.gnu.org/licenses/>. */ >>>> + >>>> +Enum(stringop_alg) String(rep_byte) Value(rep_prefix_1_byte) >>>> + >>>> +#undef DEF_ENUM >>>> +#define DEF_ENUM EnumValue >>>> + >>>> +#undef DEF_ALG >>>> +#define DEF_ALG(alg, name) Enum(stringop_alg) String(name) Value(alg) >>>> + >>>> +#include "stringop.def" >>>> + >>>> +#undef DEF_ENUM >>>> +#undef DEF_ALG >>>> Index: config/i386/i386.c >>>> =================================================================== >>>> --- config/i386/i386.c (revision 201458) >>>> +++ config/i386/i386.c (working copy) >>>> @@ -156,7 +156,7 @@ struct processor_costs ix86_size_cost = >>>> }; >>>> >>>> /* Processor costs (relative to an add) */ >>>> -static const >>>> +static >>>> struct processor_costs i386_cost = { /* 386 specific costs */ >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (1), /* cost of a lea instruction */ >>>> @@ -226,7 +226,7 @@ struct processor_costs i386_cost = { /* >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs i486_cost = { /* 486 specific costs */ >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (1), /* cost of a lea instruction */ >>>> @@ -298,7 +298,7 @@ struct processor_costs i486_cost = { /* >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs pentium_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (1), /* cost of a lea instruction */ >>>> @@ -368,7 +368,7 @@ struct processor_costs pentium_cost = { >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs pentiumpro_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (1), /* cost of a lea instruction */ >>>> @@ -447,7 +447,7 @@ struct processor_costs pentiumpro_cost = >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs geode_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (1), /* cost of a lea instruction */ >>>> @@ -518,7 +518,7 @@ struct processor_costs geode_cost = { >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs k6_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (2), /* cost of a lea instruction */ >>>> @@ -591,7 +591,7 @@ struct processor_costs k6_cost = { >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs athlon_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (2), /* cost of a lea instruction */ >>>> @@ -664,7 +664,7 @@ struct processor_costs athlon_cost = { >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs k8_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (2), /* cost of a lea instruction */ >>>> @@ -1265,7 +1265,7 @@ struct processor_costs btver2_cost = { >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs pentium4_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (3), /* cost of a lea instruction */ >>>> @@ -1336,7 +1336,7 @@ struct processor_costs pentium4_cost = { >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs nocona_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (1), /* cost of a lea instruction */ >>>> @@ -1409,7 +1409,7 @@ struct processor_costs nocona_cost = { >>>> 1, /* cond_not_taken_branch_cost. */ >>>> }; >>>> >>>> -static const >>>> +static >>>> struct processor_costs atom_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ >>>> @@ -1556,7 +1556,7 @@ struct processor_costs slm_cost = { >>>> }; >>>> >>>> /* Generic64 should produce code tuned for Nocona and K8. */ >>>> -static const >>>> +static >>>> struct processor_costs generic64_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> /* On all chips taken into consideration lea is 2 cycles and more. With >>>> @@ -1635,7 +1635,7 @@ struct processor_costs generic64_cost = >>>> }; >>>> >>>> /* core_cost should produce code tuned for Core familly of CPUs. */ >>>> -static const >>>> +static >>>> struct processor_costs core_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> /* On all chips taken into consideration lea is 2 cycles and more. With >>>> @@ -1717,7 +1717,7 @@ struct processor_costs core_cost = { >>>> >>>> /* Generic32 should produce code tuned for PPro, Pentium4, Nocona, >>>> Athlon and K8. */ >>>> -static const >>>> +static >>>> struct processor_costs generic32_cost = { >>>> COSTS_N_INSNS (1), /* cost of an add instruction */ >>>> COSTS_N_INSNS (1) + 1, /* cost of a lea instruction */ >>>> @@ -2900,6 +2900,150 @@ ix86_debug_options (void) >>>> >>>> return; >>>> } >>>> + >>>> +static const char *stringop_alg_names[] = { >>>> +#define DEF_ENUM >>>> +#define DEF_ALG(alg, name) #name, >>>> +#include "stringop.def" >>>> +#undef DEF_ENUM >>>> +#undef DEF_ALG >>>> +}; >>>> + >>>> +/* Parse parameter string passed to -mmemcpy-strategy= or >>>> -mmemset-strategy=. >>>> + The string is of the following form (or comma separated list of it): >>>> + >>>> + strategy_alg:max_size:[align|noalign] >>>> + >>>> + where the full size range for the strategy is either [0, max_size] or >>>> + [min_size, max_size], in which min_size is the max_size + 1 of the >>>> + preceding range. The last size range must have max_size == -1. >>>> + >>>> + Examples: >>>> + >>>> + 1. >>>> + -mmemcpy-strategy=libcall:-1:noalign >>>> + >>>> + this is equivalent to (for known size memcpy) >>>> -mstringop-strategy=libcall >>>> + >>>> + >>>> + 2. >>>> + >>>> -mmemset-strategy=rep_8byte:16:noalign,vector_loop:2048:align,libcall:-1:noalign >>>> + >>>> + This is to tell the compiler to use the following strategy for >>>> memset >>>> + 1) when the expected size is between [1, 16], use rep_8byte >>>> strategy; >>>> + 2) when the size is between [17, 2048], use vector_loop; >>>> + 3) when the size is > 2048, use libcall. >>>> + >>>> +*/ >>>> + >>>> +struct stringop_size_range >>>> +{ >>>> + int min; >>>> + int max; >>>> + stringop_alg alg; >>>> + bool noalign; >>>> +}; >>>> + >>>> +static void >>>> +ix86_parse_stringop_strategy_string (char *strategy_str, bool is_memset) >>>> +{ >>>> + const struct stringop_algs *default_algs; >>>> + stringop_size_range input_ranges[MAX_STRINGOP_ALGS]; >>>> + char *curr_range_str, *next_range_str; >>>> + int i = 0, n = 0; >>>> + >>>> + if (is_memset) >>>> + default_algs = &ix86_cost->memset[TARGET_64BIT != 0]; >>>> + else >>>> + default_algs = &ix86_cost->memcpy[TARGET_64BIT != 0]; >>>> + >>>> + curr_range_str = strategy_str; >>>> + >>>> + do { >>>> + >>>> + int mins, maxs; >>>> + stringop_alg alg; >>>> + char alg_name[128]; >>>> + char align[16]; >>>> + >>>> + next_range_str = strchr (curr_range_str, ','); >>>> + if (next_range_str) >>>> + *next_range_str++ = '\0'; >>>> + >>>> + if (3 != sscanf (curr_range_str, "%[^:]:%d:%s", alg_name, &maxs, >>>> align)) >>>> + { >>>> + warning (0, "Wrong arg %s to option %s", curr_range_str, >>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >>>> + return; >>>> + } >>>> + >>>> + if (n > 0 && (maxs < (mins = input_ranges[n - 1].max + 1) && maxs != >>>> -1)) >>>> + { >>>> + warning (0, "Size ranges of option %s should be increasing", >>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >>>> + return; >>>> + } >>>> + >>>> + for (i = 0; i < last_alg; i++) >>>> + { >>>> + if (!strcmp (alg_name, stringop_alg_names[i])) >>>> + { >>>> + alg = (stringop_alg) i; >>>> + break; >>>> + } >>>> + } >>>> + >>>> + if (i == last_alg) >>>> + { >>>> + warning (0, "Wrong stringop strategy name %s specified for option >>>> %s", >>>> + alg_name, >>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >>>> + return; >>>> + } >>>> + >>>> + input_ranges[n].min = mins; >>>> + input_ranges[n].max = maxs; >>>> + input_ranges[n].alg = alg; >>>> + if (!strcmp (align, "align")) >>>> + input_ranges[n].noalign = false; >>>> + else if (!strcmp (align, "noalign")) >>>> + input_ranges[n].noalign = true; >>>> + else >>>> + { >>>> + warning (0, "Unknown alignment %s specified for option %s", >>>> + align, is_memset ? "-mmemset_strategy=" : >>>> "-mmemcpy_strategy="); >>>> + return; >>>> + } >>>> + n++; >>>> + curr_range_str = next_range_str; >>>> + } while (curr_range_str); >>>> + >>>> + if (input_ranges[n - 1].max != -1) >>>> + { >>>> + warning (0, "The max value for the last size range should be -1" >>>> + " for option %s", >>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >>>> + return; >>>> + } >>>> + >>>> + if (n > MAX_STRINGOP_ALGS) >>>> + { >>>> + warning (0, "Too many size ranges specified in option %s", >>>> + is_memset ? "-mmemset_strategy=" : "-mmemcpy_strategy="); >>>> + return; >>>> + } >>>> + >>>> + /* Now override the default algs array */ >>>> + for (i = 0; i < n; i++) >>>> + { >>>> + *const_cast<int *>(&default_algs->size[i].max) = >>>> input_ranges[i].max; >>>> + *const_cast<stringop_alg *>(&default_algs->size[i].alg) >>>> + = input_ranges[i].alg; >>>> + *const_cast<int *>(&default_algs->size[i].noalign) >>>> + = input_ranges[i].noalign; >>>> + } >>>> +} >>>> + >>>> >>>> /* Override various settings based on options. If MAIN_ARGS_P, the >>>> options are from the command line, otherwise they are from >>>> @@ -4021,6 +4165,21 @@ ix86_option_override_internal (bool main >>>> /* Handle stack protector */ >>>> if (!global_options_set.x_ix86_stack_protector_guard) >>>> ix86_stack_protector_guard = TARGET_HAS_BIONIC ? SSP_GLOBAL : SSP_TLS; >>>> + >>>> + /* Handle -mmemcpy-strategy= and -mmemset-strategy= */ >>>> + if (ix86_tune_memcpy_strategy) >>>> + { >>>> + char *str = xstrdup (ix86_tune_memcpy_strategy); >>>> + ix86_parse_stringop_strategy_string (str, false); >>>> + free (str); >>>> + } >>>> + >>>> + if (ix86_tune_memset_strategy) >>>> + { >>>> + char *str = xstrdup (ix86_tune_memset_strategy); >>>> + ix86_parse_stringop_strategy_string (str, true); >>>> + free (str); >>>> + } >>>> } >>>> >>>> /* Implement the TARGET_OPTION_OVERRIDE hook. */ >>>> @@ -22903,6 +23062,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt >>>> { >>>> case libcall: >>>> case no_stringop: >>>> + case last_alg: >>>> gcc_unreachable (); >>>> case loop_1_byte: >>>> need_zero_guard = true; >>>> @@ -23093,6 +23253,7 @@ ix86_expand_movmem (rtx dst, rtx src, rt >>>> { >>>> case libcall: >>>> case no_stringop: >>>> + case last_alg: >>>> gcc_unreachable (); >>>> case loop_1_byte: >>>> case loop: >>>> @@ -23304,6 +23465,7 @@ ix86_expand_setmem (rtx dst, rtx count_e >>>> { >>>> case libcall: >>>> case no_stringop: >>>> + case last_alg: >>>> gcc_unreachable (); >>>> case loop: >>>> need_zero_guard = true; >>>> @@ -23481,6 +23643,7 @@ ix86_expand_setmem (rtx dst, rtx count_e >>>> { >>>> case libcall: >>>> case no_stringop: >>>> + case last_alg: >>>> gcc_unreachable (); >>>> case loop_1_byte: >>>> case loop: >>>> Index: config/i386/i386-opts.h >>>> =================================================================== >>>> --- config/i386/i386-opts.h (revision 201458) >>>> +++ config/i386/i386-opts.h (working copy) >>>> @@ -28,15 +28,17 @@ see the files COPYING3 and COPYING.RUNTI >>>> /* Algorithm to expand string function with. */ >>>> enum stringop_alg >>>> { >>>> - no_stringop, >>>> - libcall, >>>> - rep_prefix_1_byte, >>>> - rep_prefix_4_byte, >>>> - rep_prefix_8_byte, >>>> - loop_1_byte, >>>> - loop, >>>> - unrolled_loop, >>>> - vector_loop >>>> +#undef DEF_ENUM >>>> +#define DEF_ENUM >>>> + >>>> +#undef DEF_ALG >>>> +#define DEF_ALG(alg, name) alg, >>>> + >>>> +#include "stringop.def" >>>> +last_alg >>>> + >>>> +#undef DEF_ENUM >>>> +#undef DEF_ALG >>>> }; >>>> >>>> /* Available call abi. */ >>>> Index: doc/invoke.texi >>>> =================================================================== >>>> --- doc/invoke.texi (revision 201458) >>>> +++ doc/invoke.texi (working copy) >>>> @@ -649,6 +649,7 @@ Objective-C and Objective-C++ Dialects}. >>>> -mbmi2 -mrtm -mlwp -mthreads @gol >>>> -mno-align-stringops -minline-all-stringops @gol >>>> -minline-stringops-dynamically -mstringop-strategy=@var{alg} @gol >>>> +-mmemcpy-strategy=@var{strategy} -mmemset-strategy=@var{strategy} >>>> -mpush-args -maccumulate-outgoing-args -m128bit-long-double @gol >>>> -m96bit-long-double -mlong-double-64 -mlong-double-80 @gol >>>> -mregparm=@var{num} -msseregparm @gol >>>> @@ -14598,6 +14599,24 @@ Expand into an inline loop. >>>> Always use a library call. >>>> @end table >>>> >>>> +@item -mmemcpy-strategy=@var{strategy} >>>> +@opindex mmemcpy-strategy=@var{strategy} >>>> +Override the internal decision heuristic to decide if >>>> @code{__builtin_memcpy} >>>> +should be inlined and what inline algorithm to use when the expected size >>>> +of the copy operation is known. @var{strategy} >>>> +is a comma-separated list of @var{alg}:@var{max_size}:@var{dest_align} >>>> triplets. >>>> +@var{alg} is specified in @option{-mstringop-strategy}, @var{max_size} >>>> specifies >>>> +the max byte size with which inline algorithm @var{alg} is allowed. For >>>> the last >>>> +triplet, the @var{max_size} must be @code{-1}. The @var{max_size} of the >>>> triplets >>>> +in the list must be specified in increasing order. The minimal byte size >>>> for >>>> +@var{alg} is @code{0} for the first triplet and @code{@var{max_size} + 1} >>>> of the >>>> +preceding range. >>>> + >>>> +@item -mmemset-strategy=@var{strategy} >>>> +@opindex mmemset-strategy=@var{strategy} >>>> +The option is similar to @option{-mmemcpy-strategy=} except that it is to >>>> control >>>> +@code{__builtin_memset} expansion. >>>> + >>>> @item -momit-leaf-frame-pointer >>>> @opindex momit-leaf-frame-pointer >>>> Don't keep the frame pointer in a register for leaf functions. This >>>> Index: testsuite/gcc.target/i386/memcpy-strategy-1.c >>>> =================================================================== >>>> --- testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0) >>>> +++ testsuite/gcc.target/i386/memcpy-strategy-1.c (revision 0) >>>> @@ -0,0 +1,12 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -march=atom -mmemcpy-strategy=vector_loop:-1:align" >>>> } */ >>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } >>>> } } */ >>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */ >>>> + >>>> +char a[2048]; >>>> +char b[2048]; >>>> +void t (void) >>>> +{ >>>> + __builtin_memcpy (a, b, 2048); >>>> +} >>>> + >>>> Index: testsuite/gcc.target/i386/memcpy-strategy-2.c >>>> =================================================================== >>>> --- testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0) >>>> +++ testsuite/gcc.target/i386/memcpy-strategy-2.c (revision 0) >>>> @@ -0,0 +1,12 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -march=atom >>>> -mmemcpy-strategy=vector_loop:3000:align,libcall:-1:align" } */ >>>> +/* { dg-final { scan-assembler-times "movdqa" 8 { target { ! { ia32 } } } >>>> } } */ >>>> +/* { dg-final { scan-assembler-times "movdqa" 4 { target { ia32 } } } } */ >>>> + >>>> +char a[2048]; >>>> +char b[2048]; >>>> +void t (void) >>>> +{ >>>> + __builtin_memcpy (a, b, 2048); >>>> +} >>>> + >>>> Index: testsuite/gcc.target/i386/memset-strategy-1.c >>>> =================================================================== >>>> --- testsuite/gcc.target/i386/memset-strategy-1.c (revision 0) >>>> +++ testsuite/gcc.target/i386/memset-strategy-1.c (revision 0) >>>> @@ -0,0 +1,10 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -march=atom -mmemset-strategy=libcall:-1:align" } */ >>>> +/* { dg-final { scan-assembler-times "memset" 2 } } */ >>>> + >>>> +char a[2048]; >>>> +void t (void) >>>> +{ >>>> + __builtin_memset (a, 1, 2048); >>>> +} >>>> + >>>> Index: testsuite/gcc.target/i386/memcpy-strategy-3.c >>>> =================================================================== >>>> --- testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0) >>>> +++ testsuite/gcc.target/i386/memcpy-strategy-3.c (revision 0) >>>> @@ -0,0 +1,11 @@ >>>> +/* { dg-do compile } */ >>>> +/* { dg-options "-O2 -march=atom >>>> -mmemcpy-strategy=vector_loop:2000:align,libcall:-1:align" } */ >>>> +/* { dg-final { scan-assembler-times "memcpy" 2 } } */ >>>> + >>>> +char a[2048]; >>>> +char b[2048]; >>>> +void t (void) >>>> +{ >>>> + __builtin_memcpy (a, b, 2048); >>>> +} >>>> + >>> > > > > -- > --- > Best regards, > Michael V. Zolotukhin, > Software Engineer > Intel Corporation.