> -----Original Message----- > From: Soumya AR <[email protected]> > Sent: 13 October 2025 09:44 > To: Tamar Christina <[email protected]> > Cc: [email protected]; Kyrylo Tkachov <[email protected]>; > [email protected]; [email protected] > Subject: Re: [v3 PATCH 0/6] aarch64: Support for user-defined aarch64 tuning > Importance: High > > > > > On 29 Sep 2025, at 5:20 PM, Tamar Christina <[email protected]> > wrote: > > > > External email: Use caution opening links or attachments > > > > > >> -----Original Message----- > >> From: [email protected] <[email protected]> > >> Sent: 25 August 2025 12:00 > >> To: [email protected] > >> Cc: [email protected]; [email protected]; > [email protected] > >> Subject: [v3 PATCH 0/6] aarch64: Support for user-defined aarch64 tuning > >> > >> From: Soumya AR <[email protected]> > >> > >> Hi, > >> > >> This is a follow-up to suggestions given on v2 of the aarch64 > >> user-defined tunings. > >> > >> https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689893.html > >> Attaching an updated patch series here. > >> > >> Regarding some questions on this thread: > >> > >> I've made the changes suggested by Andrew and Richard. Thanks a lot for > >> taking > >> a look! > >> > >>> Maybe you could do: > >>> ``` > >>> template<typename T> > >>> using serialize_func_type = std::unique_ptr<json::object> > >>> (*serialize_func) (const typename std::remove_pointer<T>::type &); > >>> > >>> serialize_object_helper ( > >>> const T &member, serialize_func_type<T> serialize_func) > >>> ``` > >>> Since std::remove_pointer<T>::type will T if T is not a pointer. > >> > >> Right, this is much better. I've updated both serialize_object_helper and > >> parse_object_helper to do this. > >> > >>>> + function_map = {} > >>>> + for path_key, (path, obj_schema) in all_objects_with_paths.items(): > >>>> + filtered_path = [p for p in path if p != "tune_params"] > >>>> + if filtered_path: > >>>> + full_name = "_".join(filtered_path) > >>>> + function_map[path_key] = f"{operation}_{full_name}" > >>>> + else: > >>>> + function_map[path_key] = f"{operation}_{path_key}" > >>> > >>> Are the 6 lines above equivalent to: > >>> > >>> if len(path) > 1: > >>> full_name = "_".join(path[1:]) > >>> function_map[path_key] = f"{operation}_{full_name}" > >>> else: > >>> function_map[path_key] = f"{operation}_{path_key}" > >>> > >>> ? If so, that might be easier to follow, since it emphasises that > >>> we're structurally ignoring path[0] for all cases. It looked at first > >>> like the filtering might remove something other than the first element. > >> > >> I actually had this check to avoid the case for when we encounter the > >> "tune_params" key. On a second look, I realized I was taking care of that > case > >> earlier anyway, so the check isn't needed at all. Now it's just: > >> > >> for path_key, (path, obj_schema) in all_objects_with_paths.items(): > >> if path: > >> full_name = "_".join(path) > >> function_map[path_key] = f"{operation}_{full_name}" > >> else: > >> function_map[path_key] = f"{operation}_{path_key}" > >> > >> > >>> How is memory management handled (both here and for pointers to > >>> structures)? Do we just allocate and never free? If so, that's ok for > >>> command-line options, but I wonder whether we'll ewventually be using > >>> this on a per-function basis. (Hopefully not though...) > >> > >> About this, I addressed it earlier in a separate reply: > >> https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689902.html > >> > >> Pasting the relevant information from that thread. > >> > >> Possible solutions: > >> 1. Simply don't bother freeing up the memory and let the memory leak be :) > >> 2. Use the GCC garbage collector > >> 3. Remove the pointers and have all the nested structures be embedded > since > >> I'm not too sure why this is inconsistent in tune_params. > >> 4. (The current approach) Have each temporary structure be a static local > >> variable. This works, but operates under the assumption that nested > >> structures > >> in tune_params will always have unique types, since the variable > declaration is > >> a template that is resolved by the type of the nested structures. There is > >> an > >> extra check added to ensure we don't overwrite accidentally, in the case we > >> ever end up having structure members of the same type. This is still a bit > ugly > >> though, so I would appreciate if there are other suggestions on how to > handle > >> this. > >> > >>> I agree with Andrew that it would be nice in principle to be able > >>> to autogenerate this. I'm personally happy without that while it > >>> remains this small, but I think it would at least be worth sharing > >>> this between the parsing and serialising code, rather than having > >>> separate lists for each. > >> > >> Auto-generating this would not be a problem, but my main agenda with > the > >> script > >> was to have a setup that was only dependent on the JSON schema and not > >> have any > >> actual references to the tuning parameters. Essentially, make the script > >> as agnostic of the tuning parameters as possible, so anyone modifying the > >> tuning > >> paramters does not need to touch the script (and only modify the schema). > >> > >> Extending that philosophy to the enums, I now have the following setup: > >> > >> I've centralized the tuning enum definitions into a new aarch64-tuning- > >> enums.def > >> file. The enums (autoprefetch_model and ldp_stp_policy) are now defined > >> once > >> using macros and consumed by both the core definitions (aarch64-opts.h, > >> aarch64-protos.h) and the generated parser/printer code. This ensures that > if > >> someone wants to add a new enum value, they only need to make > >> modifications in > >> the .def file. The codegen from the script refers to the same enums. > >> > >> If this is excessive, I can just have the script directly generate the > >> enum to > >> string maps. Let me know what you think. > >> > >>> Could you explain the reason for Negative(fdump-tuning-model=) and > >> ToLower? > >> > >> ToLower is incorrect here, I've removed it. We should accept any filename > for > >> the user provided files. Thanks for pointing this out. > >> > >> As I understand it, Negative() will prune all earlier instances of the > >> option, > >> yes? I suppose in any case, even if I don't specify Negative(), then the > second > >> fdump-tuning-model= would override the first one? At that point, it seems > >> better > >> to not expose the earlier option at all..? > >> > >> Similar logic for Negative(muser-provided-CPU=), where we would only like > to > >> parse one file. > >> > >>> The option should be documented in invoke.texi, or marked > Undocumented > >> if > >>> we really don't want to put it in the manual. > >> > >> Not too sure about this one. I've marked it Undocumented for now but I > can > >> see > >> intances of options that are marked Undocumented but still specified in > >> invoke.texi. Should we document its usage anyway in invoke.texi? > >> Also, should we have something similar to -moverride, where we specify > that > >> this > >> feature is only for power users? > > > > Hi Soumya, > > > > I'm still catching up on the series. So I'll come back to the questions > > above > > once I finish going through all the patches. > > > > But Yeah I think we want something like this to indicate it's for users whom > > really know what they're doing. > > > > Now to that end as I'm going through the series I had a question. > > > > What's the interaction model here between this option and push/pop > > pragmas or attributes that set a tuning/mcpu. > > > > At the moment the flag is handled in override_options which means > > It will override all cost models. > > > > e.g. > > > > __attribute__ ((target("cpu=neoverse-n1"))) > > void foo () > > {} > > > > Would be overridden. Is this the intention, or did we only want to > > override the default/global cost model with this option? > > Thanks for bringing this up, I hadn't thought about target attributes when I > did > implement it. As I understand, if override_options is given hierarchy over > push/ > pop pragmas or attributes, then it's fair to say that we want CLI to have > priority? > > Then, in a similar vein, -muser-provided-CPU=neoverse-v2.json should have > priority over __attribute__ ((target("cpu=neoverse-n1"))). > > Another point of contention here is -moverride itself, I suppose this should > override -muser-provided-CPU, which it doesn't in the current > implementation. > > I think then what makes sense to me in order of hierarchy is: > > -moverride > -muser-provided-CPU > attributes/pragmas >
Yeah normally the preference is the other way, but for this particular feature I agree, since the options do partial overriding the correct semantics seems to be to perform the operation in this order. So that's ok with me unless anyone else has an objection. Thanks, Tamar > What do you think? > > Thanks, > Soumya > > > > > Thanks, > > Tamar > >> > >> CCing Richard's personal email here (in case you would still like to give > >> feedback on this). Thank you for all your help with this feature, as well > >> as all > >> of my other work! :) > >> > >> Best, > >> Soumya > >> > >> Soumya AR (6): > >> aarch64 + arm: Remove const keyword from tune_params members and > >> nested members > >> aarch64: Enable dumping of AArch64 CPU tuning parameters to JSON > >> json: Add get_map() method to JSON object class > >> aarch64: Enable parsing of user-provided AArch64 CPU tuning parameters > >> aarch64: Regression tests for parsing of user-provided AArch64 CPU > >> tuning parameters > >> aarch64: Script to auto generate JSON tuning routines > >> > >> gcc/config.gcc | 2 +- > >> .../aarch64-generate-json-tuning-routines.py | 374 +++++++++++++ > >> gcc/config/aarch64/aarch64-json-schema.h | 261 +++++++++ > >> .../aarch64-json-tunings-parser-generated.inc | 355 ++++++++++++ > >> .../aarch64/aarch64-json-tunings-parser.cc | 527 ++++++++++++++++++ > >> .../aarch64/aarch64-json-tunings-parser.h | 29 + > >> ...aarch64-json-tunings-printer-generated.inc | 439 +++++++++++++++ > >> .../aarch64/aarch64-json-tunings-printer.cc | 137 +++++ > >> .../aarch64/aarch64-json-tunings-printer.h | 28 + > >> gcc/config/aarch64/aarch64-opts.h | 6 +- > >> gcc/config/aarch64/aarch64-protos.h | 169 +++--- > >> gcc/config/aarch64/aarch64-tuning-enums.def | 37 ++ > >> gcc/config/aarch64/aarch64.cc | 20 + > >> gcc/config/aarch64/aarch64.opt | 8 + > >> gcc/config/aarch64/t-aarch64 | 21 + > >> gcc/config/arm/aarch-common-protos.h | 128 ++--- > >> gcc/json.h | 26 +- > >> gcc/selftest-run-tests.cc | 1 + > >> gcc/selftest.h | 1 + > >> .../aarch64-json-tunings.exp | 35 ++ > >> .../aarch64/aarch64-json-tunings/boolean-1.c | 6 + > >> .../aarch64-json-tunings/boolean-1.json | 9 + > >> .../aarch64/aarch64-json-tunings/boolean-2.c | 7 + > >> .../aarch64-json-tunings/boolean-2.json | 9 + > >> .../aarch64-json-tunings/empty-brackets.c | 6 + > >> .../aarch64-json-tunings/empty-brackets.json | 1 + > >> .../aarch64/aarch64-json-tunings/empty.c | 6 + > >> .../aarch64/aarch64-json-tunings/empty.json | 0 > >> .../aarch64/aarch64-json-tunings/enum-1.c | 8 + > >> .../aarch64/aarch64-json-tunings/enum-1.json | 7 + > >> .../aarch64/aarch64-json-tunings/enum-2.c | 7 + > >> .../aarch64/aarch64-json-tunings/enum-2.json | 7 + > >> .../aarch64/aarch64-json-tunings/integer-1.c | 7 + > >> .../aarch64-json-tunings/integer-1.json | 6 + > >> .../aarch64/aarch64-json-tunings/integer-2.c | 7 + > >> .../aarch64-json-tunings/integer-2.json | 6 + > >> .../aarch64/aarch64-json-tunings/integer-3.c | 7 + > >> .../aarch64-json-tunings/integer-3.json | 5 + > >> .../aarch64/aarch64-json-tunings/integer-4.c | 6 + > >> .../aarch64-json-tunings/integer-4.json | 5 + > >> .../aarch64/aarch64-json-tunings/string-1.c | 8 + > >> .../aarch64-json-tunings/string-1.json | 7 + > >> .../aarch64/aarch64-json-tunings/string-2.c | 7 + > >> .../aarch64-json-tunings/string-2.json | 5 + > >> .../aarch64/aarch64-json-tunings/test-all.c | 58 ++ > >> .../aarch64-json-tunings/test-all.json | 39 ++ > >> .../aarch64-json-tunings/unidentified-key.c | 6 + > >> .../unidentified-key.json | 5 + > >> 48 files changed, 2705 insertions(+), 156 deletions(-) > >> create mode 100644 gcc/config/aarch64/aarch64-generate-json-tuning- > >> routines.py > >> create mode 100644 gcc/config/aarch64/aarch64-json-schema.h > >> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-parser- > >> generated.inc > >> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-parser.cc > >> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-parser.h > >> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-printer- > >> generated.inc > >> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-printer.cc > >> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-printer.h > >> create mode 100644 gcc/config/aarch64/aarch64-tuning-enums.def > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/aarch64-json-tunings.exp > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/boolean-1.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/boolean-1.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/boolean-2.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/boolean-2.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/empty-brackets.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/empty-brackets.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/empty.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/empty.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/enum-1.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/enum-1.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/enum-2.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/enum-2.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/integer-1.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/integer-1.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/integer-2.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/integer-2.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/integer-3.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/integer-3.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/integer-4.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/integer-4.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/string-1.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/string-1.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/string-2.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/string-2.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/test-all.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/test-all.json > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/unidentified-key.c > >> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json- > >> tunings/unidentified-key.json > >> > >> -- > >> 2.44.0 >
