(resending due to something going wrong with my email client)

The 02/03/2026 01:49, Yangyu Chen wrote:
> This patch adds support for target_clones table option. The
> target_clones table option allows users to specify multiple versions
> of a function and select the version at runtime based on the specified
> table.
>
> The target clone table is a JSON object where function names serve as
> keys, and their values are nested objects. Each nested object maps
> architecture names to lists of target clone attributes. This structure
> allows specifying function clones for different architectures. Below is
> an example:
>
> ```
> {
>   "foo": {
>     "x86_64": ["avx2", "avx512f"],
>     "riscv64": ["arch=+v", "arch=+zba,+zbb", ...],
>     ... // more architectures
>   },
>   // more functions
> }
> ```
>
> A example of the table is as follows on RISC-V:
>
> C source code "ror32.c":
>
> ```
> void ror32(unsigned int *a, unsigned int b, unsigned long size) {
>   for (unsigned long i = 0; i < size; i++) {
>     a[i] = a[i] >> b | (a[i] << (32 - b));
>   }
> }
> ```
>
> Table "ror32.target_clones":
>
> ```
> {
>   "ror32": {
>     "riscv64": ["arch=+zvbb,+zbb", "arch=+zbb"]
>   }
> }
> ```
>
> Then use: gcc -O3 -ftarget-clones-table=ror32.target_clones -S ror32.c
> to compile the source code. This will generate 3 versions and its IFUNC
> resolver for the ror32 function which is "arch=+zvbb,+zbb" and
> "arch=+zbb" and the default version.

FWIW, I still think this is a great feature.
One minor comment below.

>
> Signed-off-by: Yangyu Chen <[email protected]>
>
> gcc/ChangeLog:
>
>       * Makefile.in: Add -DTARGET_NAME to CFLAGS-multiple_target.o.
>       * common.opt: Add target clone table option.
>       * multiple_target.cc (expand_target_clones): Add support for
>       target_clones table option.
>       (node_versionable_function_p): New function to check if a function
>       can be versioned.
>       (init_clone_map): Ditto.
>       (ipa_target_clone): Ditto.
>       * doc/invoke.texi: Add document for target clone table option.
>
> gcc/testsuite/ChangeLog:
>
>       * gcc.target/i386/tct-0.c: New test.
>       * gcc.target/i386/tct-0.json: New test.
> ---
>  gcc/Makefile.in                          |   1 +
>  gcc/common.opt                           |   7 +
>  gcc/doc/invoke.texi                      |  69 +++++++++-
>  gcc/multiple_target.cc                   | 163 +++++++++++++++++++++--
>  gcc/testsuite/gcc.target/i386/tct-0.c    |  11 ++
>  gcc/testsuite/gcc.target/i386/tct-0.json |   5 +
>  6 files changed, 246 insertions(+), 10 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/tct-0.c
>  create mode 100644 gcc/testsuite/gcc.target/i386/tct-0.json
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index abf98aabac83..45f3d9fb61db 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -2728,6 +2728,7 @@ s-bversion: BASE-VER
>       $(STAMP) s-bversion
>  
>  CFLAGS-toplev.o += -DTARGET_NAME=\"$(target_noncanonical)\"
> +CFLAGS-multiple_target.o += -DTARGET_NAME=\"$(target_noncanonical)\"
>  CFLAGS-tree-diagnostic-client-data-hooks.o += 
> -DTARGET_NAME=\"$(target_noncanonical)\"
>  CFLAGS-optinfo-emit-json.o += -DTARGET_NAME=\"$(target_noncanonical)\" 
> $(ZLIBINC)
>  CFLAGS-analyzer/engine.o += $(ZLIBINC)
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 88b79bbf8f56..a23d1288dd25 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -2741,6 +2741,13 @@ fprofile-reorder-functions
>  Common Var(flag_profile_reorder_functions) Optimization
>  Enable function reordering that improves code placement.
>  
> +ftarget-clones-table=
> +Common Joined RejectNegative Var(target_clones_table)
> +Enable target clones attributes specified in the JSON file.
> +
> +Variable
> +const char *target_clones_table = NULL
> +
>  fpatchable-function-entry=
>  Common Var(flag_patchable_function_entry) Joined RejectNegative Optimization
>  Insert NOP instructions at each function entry.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 58254b82b0e0..64e8ff190ab9 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -677,7 +677,7 @@ Objective-C and Objective-C++ Dialects}.
>  -fspeculatively-call-stored-functions  -fsplit-paths
>  -fsplit-wide-types  -fsplit-wide-types-early  -fssa-backprop  -fssa-phiopt
>  -fstdarg-opt  -fstore-merging  -fstrict-aliasing -fipa-strict-aliasing
> --fthread-jumps  -ftracer  -ftree-bit-ccp
> +-ftarget-clones-table=@var{path} -fthread-jumps  -ftracer  -ftree-bit-ccp
>  -ftree-builtin-call-dce  -ftree-ccp  -ftree-ch  -ftree-coalesce-vars
>  -ftree-copy-prop  -ftree-cselim  -ftree-dce  -ftree-dominator-opts
>  -ftree-dse  -ftree-forwprop  -ftree-fre  -fcode-hoisting
> @@ -13937,6 +13937,73 @@ assumptions based on that.
>  
>  The default is @option{-fzero-initialized-in-bss} except in Ada.
>  
> +@opindex ftarget-clones-table
> +@item -ftarget-clones-table=@var{path}
> +Read JSON formatted target_clone table to insert @code{target_clones}
> +attribute to functions based on the table contents.  Where @var{path} is a
> +file path to the JSON formatted target_clone table.  When attribute
> +@code{target_clones} is applied to a function, the compiler creates
> +multiple versions of the function for different targets.  When attribute
> +@code{target_clones} also exist on the function definition, the targets
> +specified in the attribute and the targets specified in the JSON formatted
> +target_clone table are merged.
> +
> +This is useful when you want to optimize a function for different CPU
> +features and select the appropriate version at runtime based on the CPU
> +features of the machine the program is running on without source code
> +modification.
> +
> +This table is formatted as:
> +
> +@smallexample
> +@{
> +  "mangled_function_name1": @{
> +    "target1": ["attr1", ...],
> +    ...
> +  @},
> +  ...
> +@}
> +@end smallexample
> +
> +For example, say you have a C++ function named @code{foo} with return type
> +@code{void} without parameters, and you want to clone it for targets
> +@code{arch=x86-64-v3} and @code{arch=x86-64-v4} on x86-64 architecture.
> +You can create a JSON formatted target_clone table file named
> +@file{target_clones.json} with the following contents:
> +
> +@smallexample
> +@{
> +  "_Z3foov": @{
> +    "x86_64": ["arch=x86-64-v3", "arch=x86-64-v4"]
> +  @}
> +@}
> +@end smallexample
> +
> +Then, you can use @option{-ftarget-clones-table=target_clones.json}
> +to clone the function @code{foo} for the targets
> +@code{arch=x86-64-v3} and @code{arch=x86-64-v4}. This is equivalent
> +to adding the following attribute to the function definition:
> +
> +@smallexample
> +#ifdef __x86_64__
> +__attribute__ ((target_clones ("arch=x86-64-v3", "arch=x86-64-v4", 
> "default")))
> +#endif
> +void foo (void) @{ ... @}
> +@end smallexample
> +
> +This design also support to have multiple architectures in the same JSON
> +formatted target_clone table.  For other architectures like aarch64 and
> +riscv64, you can add entries like the following:
> +
> +@smallexample
> +@{
> +  "_Z3barv": @{
> +    "aarch64": ["sve2"],
> +    "riscv64": ["arch=+v,+zvfh"]
> +  @}
> +@}
> +@end smallexample
> +
>  @opindex fthread-jumps
>  @opindex fno-thread-jumps
>  @item -fthread-jumps
> diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc
> index 3f8676f59435..532461458948 100644
> --- a/gcc/multiple_target.cc
> +++ b/gcc/multiple_target.cc
> @@ -21,6 +21,10 @@ along with GCC; see the file COPYING3.  If not see
>  <http://www.gnu.org/licenses/>.  */
>  
>  #include "config.h"
> +#include <fstream>
> +#define INCLUDE_MAP
> +#define INCLUDE_STRING
> +#define INCLUDE_SSTREAM
>  #include "system.h"
>  #include "coretypes.h"
>  #include "backend.h"
> @@ -38,6 +42,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-walk.h"
>  #include "tree-inline.h"
>  #include "intl.h"
> +#include "json-parsing.h"
>  
>  /* Walker callback that replaces all FUNCTION_DECL of a function that's
>     going to be versioned.  */
> @@ -252,30 +257,76 @@ create_target_clone (cgraph_node *node, bool 
> definition, char *name,
>    return new_node;
>  }
>  
> +/* Skip functions that are declared but not defined.  Also skip C++
> +   virtual functions, as they cannot be cloned.  The same logic is in the
> +   function expand_target_clones below.  */
> +static bool node_versionable_function_p (cgraph_node *node)
> +{
> +  return (!node->definition
> +       || (!node->alias && tree_versionable_function_p (node->decl)))
> +       && !DECL_VIRTUAL_P (node->decl);
> +}
> +
>  /* If the function in NODE has multiple target attributes
>     create the appropriate clone for each valid target attribute.  */
>  
>  static bool
> -expand_target_clones (struct cgraph_node *node, bool definition)
> +expand_target_clones (struct cgraph_node *node, bool definition,
> +                   std::map <std::string, auto_vec<string_slice> >
> +                   &clone_map)
>  {
>    /* Parsing target attributes separated by TARGET_CLONES_ATTR_SEPARATOR.  */
>    tree attr_target = lookup_attribute ("target_clones",
>                                      DECL_ATTRIBUTES (node->decl));
> -  /* No targets specified.  */
> -  if (!attr_target)
> -    return false;
> -
>    int num_defaults = 0;
>    auto_vec<string_slice> attr_list = get_clone_versions (node->decl,
>                                                        &num_defaults);
> -
> -  /* If the target clones list is empty after filtering, remove this node.  
> */
> -  if (!TARGET_HAS_FMV_TARGET_ATTRIBUTE && attr_list.is_empty ())
> +  /* When target_clones attribute is present, but there is no valid
> +     entries, we believe there can be another function with the same name
> +     but has "target_version" specified, so we remove this function, this
> +     only applies to aarch64 for now.  For RISC-V, it will give an error
> +     earlier.
> +
> +     This can barely happen in practice as the "default" attribute can
> +     always be added to avoid this.  Thus, we even skip target clones table
> +     lookup in this case.  Any following architectures that use
> +     "target_version" semantics should aware of this behaviour if it will
> +     not give error but just skip during attribute checking.  */
> +  if (!TARGET_HAS_FMV_TARGET_ATTRIBUTE && attr_list.is_empty () && 
> attr_target)
>      {
>        node->remove ();
        return false;
>      }
>  
> +  if (DECL_INITIAL (node->decl) != NULL_TREE)
> +    {
> +      auto it = clone_map.find (IDENTIFIER_POINTER (
> +                             DECL_ASSEMBLER_NAME_RAW (node->decl)));
> +      if (it != clone_map.end () && node_versionable_function_p (node))

I think its worth checking here that the node is the default.

A check like:

(!DECL_FUNCTION_VERSIONED (node->decl) || is_function_default_version 
(node->decl))

should cover it.

This avoids us doing something wrong in the case of 

int foo [[gnu::target_version("XYZ")]] { ... }
int foo [[gnu::target_version("default")]] { ... }


{
  "foo": {
    "MY_ARCH": ["ABC"],
  },
}

Otherwise, LGTM (but cannot approve).

> +     {
> +       /* Merge valid target attributes from -ftarget-clones-table.  */
> +       for (string_slice attr : it->second)
> +         if (targetm.check_target_clone_version (attr, NULL))
> +           attr_list.safe_push (attr);
> +         else
> +             warning_at (DECL_SOURCE_LOCATION (node->decl),
> +                         0, "ignoring unsupported target clone "
> +                         "version '%B' from target clones table",
> +                         &attr);
> +
> +       if (num_defaults == 0)
> +         {
> +           /* No default in the source attribute, add one.  */
> +           attr_list.safe_push ("default");
> +           num_defaults = 1;
> +         }
> +     }
> +    }
> +
> +  /* If there is no target_clones attribute, nothing to do.  */
> +  if (attr_list.is_empty ())
> +      return false;
> +
>    /* No need to clone for 1 target attribute.  */
>    if (attr_list.length () == 1 && TARGET_HAS_FMV_TARGET_ATTRIBUTE)
>      {
> @@ -545,11 +596,105 @@ is_simple_target_clones_case (cgraph_node *node)
>    return true;
>  }
>  
> +/* Initialize the clone map from the target clone table JSON file.  Specified
> +   by the -ftarget-clone-table option.  The map is a mapping from symbol name
> +   to a string with target clones attributes separated by
> +   TARGET_CLONES_ATTR_SEPARATOR.  */
> +static std::map <std::string, auto_vec<string_slice> >
> +init_clone_map (void)
> +{
> +  std::map <std::string, auto_vec<string_slice> > res;
> +  if (! target_clones_table)
> +    return res;
> +
> +  /* Take target string from TARGET_NAME, this macro looks like
> +     "x86_64-linux-gnu" and we need to strip all the suffixes
> +     after the first dash, so it becomes "x86_64".  */
> +  std::string target = TARGET_NAME;
> +  if (target.find ('-') != std::string::npos)
> +    target.erase (target.find ('-'));
> +
> +  /* Open the target clone table file and read to a string.  */
> +  std::ifstream json_file (target_clones_table);
> +  if (json_file.fail ())
> +    {
> +      error ("cannot open target clone table file %s",
> +          target_clones_table);
> +      return res;
> +    }
> +  std::stringstream ss_buf;
> +  ss_buf << json_file.rdbuf ();
> +  std::string json_str = ss_buf.str ();
> +
> +  /* Parse the JSON string.
> +     The JSON string format looks like this:
> +     {
> +       "symbol_name1": {
> +      "target1": ["clone1", "clone2", ...],
> +      "target2": ["clone1", "clone2", ...],
> +       },
> +       ...
> +     }
> +     where symbol_name is the ASM name of the function mangled by the
> +     frontend.  The target1 and target2 are the targets, which can be
> +     "x86_64", "aarch64", "riscv64", etc.  The clone1, clone2, etc are the
> +     target clones attributes, which can be "avx2", "avx512" etc.  Note that
> +     there is no need to specify the "default" target clone, it is
> +     automatically added by the pass.  */
> +  json::parser_result_t result = json::parse_utf8_string (
> +    json_str.size (), json_str.c_str (), true, NULL);
> +  if (auto json_err = result.m_err.get ())
> +    {
> +      error ("error parsing target clone table file %s: %s",
> +          target_clones_table, json_err->get_msg ());
> +      return res;
> +    }
> +
> +  auto json_val = result.m_val.get ();
> +  auto kind = json_val->get_kind ();
> +  if (kind != json::JSON_OBJECT)
> +    {
> +      error ("target clone table file %s is not a JSON object",
> +          target_clones_table);
> +      return res;
> +    }
> +  auto json_obj = static_cast<const json::object *> (json_val);
> +  unsigned i;
> +  const char *symbol_name;
> +  FOR_EACH_VEC_ELT (*json_obj, i, symbol_name)
> +    {
> +      auto symbol_val = json_obj->get (symbol_name);
> +      if (!symbol_val || symbol_val->get_kind () != json::JSON_OBJECT)
> +     continue;
> +      auto symbol_obj = static_cast<const json::object *> (symbol_val);
> +      auto cur_target_val = symbol_obj->get (target.c_str ());
> +      if (!cur_target_val
> +       || cur_target_val->get_kind () != json::JSON_ARRAY)
> +     continue;
> +      auto cur_target_array = static_cast<const json::array *>
> +     (cur_target_val);
> +      for (unsigned j = 0; j < cur_target_array->length (); j++)
> +     {
> +       auto target_str_val = cur_target_array->get (j);
> +       if (target_str_val->get_kind () != json::JSON_STRING)
> +         error ("target clones attribute is not a string");
> +       const char *target_str
> +         = static_cast<const json::string *> (target_str_val)->get_string ();
> +       if (strcmp (target_str, "default") == 0)
> +           error ("No need to specify \"default\" in target clones table");
> +       res[symbol_name].safe_push (string_slice (ggc_strdup (target_str)));
> +     }
> +    }
> +  return res;
> +}
> +
>  static unsigned int
>  ipa_target_clone (bool early)
>  {
>    struct cgraph_node *node;
>    auto_vec<cgraph_node *> to_dispatch;
> +  std::map <std::string, auto_vec<string_slice> > clone_map
> +    = init_clone_map ();
>  
>    /* Don't need to do anything early for target attribute semantics.  */
>    if (early && TARGET_HAS_FMV_TARGET_ATTRIBUTE)
> @@ -582,7 +727,7 @@ ipa_target_clone (bool early)
>        the simple case.  Simple cases are dispatched in the later stage.  */
>  
>        if (early == !is_simple_target_clones_case (node))
> -     if (expand_target_clones (node, node->definition)
> +     if (expand_target_clones (node, node->definition, clone_map)
>           && TARGET_HAS_FMV_TARGET_ATTRIBUTE)
>         /* In non target_version semantics, dispatch all target clones.  */
>         to_dispatch.safe_push (node);
> diff --git a/gcc/testsuite/gcc.target/i386/tct-0.c 
> b/gcc/testsuite/gcc.target/i386/tct-0.c
> new file mode 100644
> index 000000000000..0de0938e9fd0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/tct-0.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-ifunc "" } */
> +/* { dg-options "-ftarget-clones-table=${srcdir}/gcc.target/i386/tct-0.json" 
> } */
> +/* { dg-final { scan-assembler "foo\.default" } } */
> +/* { dg-final { scan-assembler "foo\.arch_x86_64_v2" } } */
> +/* { dg-final { scan-assembler "foo\.arch_x86_64_v3" } } */
> +/* { dg-final { scan-assembler "foo\.arch_x86_64_v4" } } */
> +
> +void foo() {
> +
> +}
> diff --git a/gcc/testsuite/gcc.target/i386/tct-0.json 
> b/gcc/testsuite/gcc.target/i386/tct-0.json
> new file mode 100644
> index 000000000000..9dc712e66e2b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/tct-0.json
> @@ -0,0 +1,5 @@
> +{
> +    "foo": {
> +        "x86_64": ["arch=x86-64-v2", "arch=x86-64-v3", "arch=x86-64-v4"]
> +    }
> +}
> -- 
> 2.51.0
>

-- 
Alfie Richards

Reply via email to