The 08/08/2025 17:44, Richard Sandiford wrote: > <alfie.richa...@arm.com> writes: > > From: Alfie Richards <alfie.richa...@arm.com> > > > > Adds an optimisation in FMV to redirect to a specific target if possible. > > > > A call is redirected to a specific target if both: > > - the caller can always call the callee version > > - and, it is possible to rule out all higher priority versions of the callee > > fmv set. That is estabilished either by the callee being the highest > > priority > > version, or each higher priority version of the callee implying that, > > were it > > resolved, a higher priority version of the caller would have been > > selected. > > > > For this logic, introduces the new > > TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A > > hook. Adds a full implementation for Aarch64, and a weaker default version > > for other targets. > > > > This allows the target to replace the previous optimisation as the new one > > is > > able to cover the same case where two function sets implement the same > > versions. > > Thanks for the update. The new semantics look good to me. > > > > > gcc/ChangeLog: > > > > * config/aarch64/aarch64.cc (aarch64_functions_b_resolvable_from_a): New > > function. > > (TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A): New define. > > * doc/tm.texi: Regenerate. > > * doc/tm.texi.in: Add documentation for version_a_implies_version_b. > > * multiple_target.cc (redirect_to_specific_clone): Add new optimisation > > logic. > > (ipa_target_clone): Remove check for TARGET_HAS_FMV_TARGET_ATTRIBUTE. > > * target.def: Document new hook.. > > * attribs.cc: (functions_b_resolvable_from_a) New function. > > * attribs.h: (functions_b_resolvable_from_a) New function. > > > > gcc/testsuite/ChangeLog: > > > > * g++.target/aarch64/fmv-selection1.C: New test. > > * g++.target/aarch64/fmv-selection2.C: New test. > > * g++.target/aarch64/fmv-selection3.C: New test. > > * g++.target/aarch64/fmv-selection4.C: New test. > > * g++.target/aarch64/fmv-selection5.C: New test. > > * g++.target/aarch64/fmv-selection6.C: New test. > > * g++.target/aarch64/fmv-selection7.C: New test. > > --- > > gcc/attribs.cc | 20 ++++ > > gcc/attribs.h | 1 + > > gcc/config/aarch64/aarch64.cc | 28 +++++ > > gcc/doc/tm.texi | 8 ++ > > gcc/doc/tm.texi.in | 2 + > > gcc/multiple_target.cc | 110 +++++++++++------- > > gcc/target.def | 11 ++ > > .../g++.target/aarch64/fmv-selection1.C | 40 +++++++ > > .../g++.target/aarch64/fmv-selection2.C | 40 +++++++ > > .../g++.target/aarch64/fmv-selection3.C | 25 ++++ > > .../g++.target/aarch64/fmv-selection4.C | 30 +++++ > > .../g++.target/aarch64/fmv-selection5.C | 28 +++++ > > .../g++.target/aarch64/fmv-selection6.C | 27 +++++ > > .../g++.target/aarch64/fmv-selection7.C | 65 +++++++++++ > > 14 files changed, 392 insertions(+), 43 deletions(-) > > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection1.C > > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection2.C > > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection3.C > > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection4.C > > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection5.C > > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection6.C > > create mode 100644 gcc/testsuite/g++.target/aarch64/fmv-selection7.C > > > > diff --git a/gcc/attribs.cc b/gcc/attribs.cc > > index 2ca82674f7c..fc1751a5f2e 100644 > > --- a/gcc/attribs.cc > > +++ b/gcc/attribs.cc > > @@ -1095,6 +1095,26 @@ common_function_versions (string_slice fn1 > > ATTRIBUTE_UNUSED, > > gcc_unreachable (); > > } > > > > +/* Default implementation of > > TARGET_OPTION_FUNCTION_B_RESOLVBLE_FROM_FUNCTION_A. > > TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A > > > + Used to check very basically if FN2 is callable from FN1. > > + For now this checks if the version strings are the same. */ > > + > > +bool > > +functions_b_resolvable_from_a (tree fn1, tree fn2, tree base > > ATTRIBUTE_UNUSED) > > Calling these decl_a and decl_b might be better, as for the aarch64 > version. I realise it doesn't matter here, since the function is > symmetric, but it would make it clearer what "a" and "b" refer to. > > > +{ > > + const char *attr_name = TARGET_HAS_FMV_TARGET_ATTRIBUTE > > + ? "target" > > + : "target_version"; > > + > > + tree attr1 = lookup_attribute (attr_name, DECL_ATTRIBUTES (fn1)); > > + tree attr2 = lookup_attribute (attr_name, DECL_ATTRIBUTES (fn2)); > > Motivated by the documentation suggestion below, I think we should > assert attr2 (attr_b). Same for the aarch64 version. > > > + > > + if (!attr1 || !attr2) > > + return false; > > + > > + return attribute_value_equal (attr1, attr2); > > +} > > + > > /* Comparator function to be used in qsort routine to sort attribute > > specification strings to "target". */ > > > > [...] > > diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi > > index 47755ad32eb..fddca8c83e0 100644 > > --- a/gcc/doc/tm.texi > > +++ b/gcc/doc/tm.texi > > @@ -10972,6 +10972,14 @@ This target hook returns @code{true} if the > > target/target-version strings > > @var{fn1} and @var{fn2} imply the same function version. > > @end deftypefn > > > > +@deftypefn {Target Hook} bool TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A > > (tree @var{v_a}, tree @var{v_b}, tree @var{base}) > > +Determines if a function with the arch implied by @var{v_b}'s > > Here too, I think decl_a and decl_b would be easier to follow. > > I think it would be worth expanding "arch" to "architecture". > > > +FMV version attributes (either @code{target} or @code{target_version} for > > function multi-versioning (FMV) > > > +target FMV or target_version FMV semantics respectively) would always be > > +resolvable from a function with @var{v_a}'s FMV version attributes. > > +Uses a function decl @var{base} to establish the baseline architecture. > > It would be good to expand on what the default is. > > How about: > > ------------------------------------------------------------------------------ > @var{decl_b} is a function declaration with a function multi-versioning (FMV) > attribute; this attribute is either @code{target} or @code{target_version}, > depending on @code{TARGET_HAS_FMV_TARGET_ATTRIBUTE}. @var{decl_a} is > a function declaration that may or may not have an FMV attribute. > > Return true if we have enough information to determine that the > requirements of @var{decl_b}'s FMV attribute are met whenever @var{decl_a} > is executed, given that the target supports all features required by > function declaration @var{base}. > > The default implementation just checks whether @var{decl_a} has the same > FMV attribute as @var{decl_b}. This is conservatively correct, > but ports can do better by taking the relationships between architecture > features into account. For example, on AArch64, @code{sve} is present > whenever @code{sve2} is present. > ------------------------------------------------------------------------------ > > Feel free to reword. It was just a way of listing the information > that seemed useful to include.
Thanks for the feedback! This is much better. > > (It looks like we're missing documentation for > TARGET_HAS_FMV_TARGET_ATTRIBUTE though.) Good spot. I will submit some documentation for that separately. > > > +@end deftypefn > > + > > @deftypefn {Target Hook} bool TARGET_CAN_INLINE_P (tree @var{caller}, tree > > @var{callee}) > > This target hook returns @code{false} if the @var{caller} function > > cannot inline @var{callee}, based on target specific information. By > > diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in > > index eaab62652a7..99f47cb9d78 100644 > > --- a/gcc/doc/tm.texi.in > > +++ b/gcc/doc/tm.texi.in > > @@ -7132,6 +7132,8 @@ with the target specific attributes. The default > > value is @code{','}. > > > > @hook TARGET_OPTION_FUNCTION_VERSIONS > > > > +@hook TARGET_OPTION_FUNCTIONS_B_RESOLVABLE_FROM_A > > + > > @hook TARGET_CAN_INLINE_P > > > > @hook TARGET_UPDATE_IPA_FN_TARGET_INFO > > diff --git a/gcc/multiple_target.cc b/gcc/multiple_target.cc > > index 7ab025d7d46..681b586e61c 100644 > > --- a/gcc/multiple_target.cc > > +++ b/gcc/multiple_target.cc > > @@ -423,61 +423,86 @@ expand_target_clones (struct cgraph_node *node, bool > > definition) > > return true; > > } > > > > -/* When NODE is a target clone, consider all callees and redirect > > - to a clone with equal target attributes. That prevents multiple > > - multi-versioning dispatches and a call-chain can be optimized. > > - > > - This optimisation might pick the wrong version in some cases, since > > knowing > > - that we meet the target requirements for a matching callee version does > > not > > - tell us that we won't also meet the target requirements for a higher > > - priority callee version at runtime. Since this is longstanding > > behaviour > > - for x86 and powerpc, we preserve it for those targets, but skip the > > optimisation > > - for targets that use the "target_version" attribute for > > multi-versioning. */ > > +/* When NODE is part of an FMV function set, consider all callees and > > check if > > + any can provably always resolve a certain version and then call that > > version > > + directly. */ > > > > static void > > redirect_to_specific_clone (cgraph_node *node) > > { > > - cgraph_function_version_info *fv = node->function_version (); > > - if (fv == NULL) > > + if (!targetm.compare_version_priority > > + || !targetm.target_option.functions_b_resolvable_from_a > > The condition on the last line should always be true. > > > + || !optimize) > > return; > > > > - gcc_assert (TARGET_HAS_FMV_TARGET_ATTRIBUTE); > > - tree attr_target = lookup_attribute ("target", DECL_ATTRIBUTES > > (node->decl)); > > - if (attr_target == NULL_TREE) > > - return; > > - > > - /* We need to remember NEXT_CALLER as it could be modified in the loop. > > */ > > + /* We need to remember NEXT_CALLER as it could be modified in the > > + loop. */ > > There's no need to reflow this comment. > > > for (cgraph_edge *e = node->callees; e ; e = e->next_callee) > > { > > - cgraph_function_version_info *fv2 = e->callee->function_version (); > > - if (!fv2) > > + /* Only if this is a call to a dispatched symbol. */ > > + if (!e->callee->dispatcher_function) > > continue; > > > > - tree attr_target2 = lookup_attribute ("target", > > - DECL_ATTRIBUTES (e->callee->decl)); > > + cgraph_function_version_info *callee_v > > + = e->callee->function_version (); > > + cgraph_function_version_info *caller_v > > + = e->caller->function_version (); > > > > - /* Function is not calling proper target clone. */ > > - if (attr_target2 == NULL_TREE > > - || !attribute_value_equal (attr_target, attr_target2)) > > - { > > - while (fv2->prev != NULL) > > - fv2 = fv2->prev; > > + /* If this is not the TU that contains the definition of the default > > + version we are not guaranteed to have visibility of all versions > > + so cannot reason about them. */ > > + if (!callee_v->next->this_node->binds_to_current_def_p ()) > > + continue; > > > > - /* Try to find a clone with equal target attribute. */ > > - for (; fv2 != NULL; fv2 = fv2->next) > > - { > > - cgraph_node *callee = fv2->this_node; > > - attr_target2 = lookup_attribute ("target", > > - DECL_ATTRIBUTES (callee->decl)); > > - if (attr_target2 != NULL_TREE > > - && attribute_value_equal (attr_target, attr_target2)) > > + cgraph_function_version_info *highest_callable_fn = NULL; > > + for (cgraph_function_version_info *ver = callee_v->next; > > + ver; > > + ver = ver->next) > > + if (targetm.target_option.functions_b_resolvable_from_a > > + (node->decl, ver->this_node->decl, node->decl)) > > + highest_callable_fn = ver; > > + > > + if (!highest_callable_fn) > > + continue; > > + > > + /* If there are higher priority versions of callee and caller has no > > + more version information, then not callable. */ > > + if (!caller_v && highest_callable_fn->next) > > + continue; > > I think this should be: > > if (highest_callable_fn->next > && (!caller_v > || !caller_v->next->this_node->binds_to_current_def_p ())) > continue; This logic isn't quite correct. My previous check that `if (!callee_v->next->this_node->binds_to_current_def_p ())` relies on the fact that callee_v is the dispatched symbol so the default node must be in this TU and must be the lowest priority so must be next. So I need slightly more complex logic to get the defauld caller node and to check if its in this TU at all. I will resubmit with that minor logic addition. > > We don't need to check whether node->binds_to_current_def_p (), > because if node is preempted/overridden, then it doesn't matter > what node does. But for highest_callable_fn->next we're assuming > that all of caller's versions are visible in the current TU. Agreed. > > > + > > + bool inlinable = true; > > + /* If every higher priority version would imply a higher priority > > + version of caller would have been selected, then this is > > + callable. */ > > + if (caller_v) > > + for (cgraph_function_version_info *callee_ver > > + = highest_callable_fn->next; > > + callee_ver; > > + callee_ver = callee_ver->next) > > + { > > + bool is_possible = true; > > + for (cgraph_function_version_info *caller_ver = caller_v->next; > > + caller_ver; > > + caller_ver = caller_ver->next) > > + if (targetm.target_option.functions_b_resolvable_from_a > > + (callee_ver->this_node->decl, > > + caller_ver->this_node->decl, > > + node->decl)) > > { > > - e->redirect_callee (callee); > > - cgraph_edge::redirect_call_stmt_to_callee (e); > > + is_possible = false; > > break; > > } > > - } > > - } > > + if (is_possible) > > + { > > + inlinable = false; > > + break; > > + } > > + } > > + if (!inlinable) > > + continue; > > I suppose this is personal preference, but I think the flow would > be easier to follow if the loop was conditional on highest_callable_fn->next. > Putting that together: > > if (highest_callable_fn->next) > { > /* If there are higher priority versions of callee and caller has no > more version information, then not callable. */ > if (!caller_v > || !caller_v->next->this_node->binds_to_current_def_p ()) > continue; > > /* If every higher priority version would imply a higher priority > version of caller would have been selected, then this is > callable. */ > for (cgraph_function_version_info *callee_ver > = highest_callable_fn->next; > callee_ver; > callee_ver = callee_ver->next) > > LGTM with those changes. > > Thanks, > Richard > > > + > > + e->redirect_callee (highest_callable_fn->this_node); > > + cgraph_edge::redirect_call_stmt_to_callee (e); > > } > > } > > -- Alfie Richards