> On 29 Sep 2025, at 5:20 PM, Tamar Christina <[email protected]> wrote:
>
> External email: Use caution opening links or attachments
>
>
>> -----Original Message-----
>> From: [email protected] <[email protected]>
>> Sent: 25 August 2025 12:00
>> To: [email protected]
>> Cc: [email protected]; [email protected]; [email protected]
>> Subject: [v3 PATCH 0/6] aarch64: Support for user-defined aarch64 tuning
>>
>> From: Soumya AR <[email protected]>
>>
>> Hi,
>>
>> This is a follow-up to suggestions given on v2 of the aarch64
>> user-defined tunings.
>>
>> https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689893.html
>> Attaching an updated patch series here.
>>
>> Regarding some questions on this thread:
>>
>> I've made the changes suggested by Andrew and Richard. Thanks a lot for
>> taking
>> a look!
>>
>>> Maybe you could do:
>>> ```
>>> template<typename T>
>>> using serialize_func_type = std::unique_ptr<json::object>
>>> (*serialize_func) (const typename std::remove_pointer<T>::type &);
>>>
>>> serialize_object_helper (
>>> const T &member, serialize_func_type<T> serialize_func)
>>> ```
>>> Since std::remove_pointer<T>::type will T if T is not a pointer.
>>
>> Right, this is much better. I've updated both serialize_object_helper and
>> parse_object_helper to do this.
>>
>>>> + function_map = {}
>>>> + for path_key, (path, obj_schema) in all_objects_with_paths.items():
>>>> + filtered_path = [p for p in path if p != "tune_params"]
>>>> + if filtered_path:
>>>> + full_name = "_".join(filtered_path)
>>>> + function_map[path_key] = f"{operation}_{full_name}"
>>>> + else:
>>>> + function_map[path_key] = f"{operation}_{path_key}"
>>>
>>> Are the 6 lines above equivalent to:
>>>
>>> if len(path) > 1:
>>> full_name = "_".join(path[1:])
>>> function_map[path_key] = f"{operation}_{full_name}"
>>> else:
>>> function_map[path_key] = f"{operation}_{path_key}"
>>>
>>> ? If so, that might be easier to follow, since it emphasises that
>>> we're structurally ignoring path[0] for all cases. It looked at first
>>> like the filtering might remove something other than the first element.
>>
>> I actually had this check to avoid the case for when we encounter the
>> "tune_params" key. On a second look, I realized I was taking care of that
>> case
>> earlier anyway, so the check isn't needed at all. Now it's just:
>>
>> for path_key, (path, obj_schema) in all_objects_with_paths.items():
>> if path:
>> full_name = "_".join(path)
>> function_map[path_key] = f"{operation}_{full_name}"
>> else:
>> function_map[path_key] = f"{operation}_{path_key}"
>>
>>
>>> How is memory management handled (both here and for pointers to
>>> structures)? Do we just allocate and never free? If so, that's ok for
>>> command-line options, but I wonder whether we'll ewventually be using
>>> this on a per-function basis. (Hopefully not though...)
>>
>> About this, I addressed it earlier in a separate reply:
>> https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689902.html
>>
>> Pasting the relevant information from that thread.
>>
>> Possible solutions:
>> 1. Simply don't bother freeing up the memory and let the memory leak be :)
>> 2. Use the GCC garbage collector
>> 3. Remove the pointers and have all the nested structures be embedded since
>> I'm not too sure why this is inconsistent in tune_params.
>> 4. (The current approach) Have each temporary structure be a static local
>> variable. This works, but operates under the assumption that nested
>> structures
>> in tune_params will always have unique types, since the variable declaration
>> is
>> a template that is resolved by the type of the nested structures. There is an
>> extra check added to ensure we don't overwrite accidentally, in the case we
>> ever end up having structure members of the same type. This is still a bit
>> ugly
>> though, so I would appreciate if there are other suggestions on how to handle
>> this.
>>
>>> I agree with Andrew that it would be nice in principle to be able
>>> to autogenerate this. I'm personally happy without that while it
>>> remains this small, but I think it would at least be worth sharing
>>> this between the parsing and serialising code, rather than having
>>> separate lists for each.
>>
>> Auto-generating this would not be a problem, but my main agenda with the
>> script
>> was to have a setup that was only dependent on the JSON schema and not
>> have any
>> actual references to the tuning parameters. Essentially, make the script
>> as agnostic of the tuning parameters as possible, so anyone modifying the
>> tuning
>> paramters does not need to touch the script (and only modify the schema).
>>
>> Extending that philosophy to the enums, I now have the following setup:
>>
>> I've centralized the tuning enum definitions into a new aarch64-tuning-
>> enums.def
>> file. The enums (autoprefetch_model and ldp_stp_policy) are now defined
>> once
>> using macros and consumed by both the core definitions (aarch64-opts.h,
>> aarch64-protos.h) and the generated parser/printer code. This ensures that if
>> someone wants to add a new enum value, they only need to make
>> modifications in
>> the .def file. The codegen from the script refers to the same enums.
>>
>> If this is excessive, I can just have the script directly generate the enum
>> to
>> string maps. Let me know what you think.
>>
>>> Could you explain the reason for Negative(fdump-tuning-model=) and
>> ToLower?
>>
>> ToLower is incorrect here, I've removed it. We should accept any filename for
>> the user provided files. Thanks for pointing this out.
>>
>> As I understand it, Negative() will prune all earlier instances of the
>> option,
>> yes? I suppose in any case, even if I don't specify Negative(), then the
>> second
>> fdump-tuning-model= would override the first one? At that point, it seems
>> better
>> to not expose the earlier option at all..?
>>
>> Similar logic for Negative(muser-provided-CPU=), where we would only like to
>> parse one file.
>>
>>> The option should be documented in invoke.texi, or marked Undocumented
>> if
>>> we really don't want to put it in the manual.
>>
>> Not too sure about this one. I've marked it Undocumented for now but I can
>> see
>> intances of options that are marked Undocumented but still specified in
>> invoke.texi. Should we document its usage anyway in invoke.texi?
>> Also, should we have something similar to -moverride, where we specify that
>> this
>> feature is only for power users?
>
> Hi Soumya,
>
> I'm still catching up on the series. So I'll come back to the questions above
> once I finish going through all the patches.
>
> But Yeah I think we want something like this to indicate it's for users whom
> really know what they're doing.
>
> Now to that end as I'm going through the series I had a question.
>
> What's the interaction model here between this option and push/pop
> pragmas or attributes that set a tuning/mcpu.
>
> At the moment the flag is handled in override_options which means
> It will override all cost models.
>
> e.g.
>
> __attribute__ ((target("cpu=neoverse-n1")))
> void foo ()
> {}
>
> Would be overridden. Is this the intention, or did we only want to
> override the default/global cost model with this option?
Thanks for bringing this up, I hadn't thought about target attributes when I did
implement it. As I understand, if override_options is given hierarchy over push/
pop pragmas or attributes, then it's fair to say that we want CLI to have
priority?
Then, in a similar vein, -muser-provided-CPU=neoverse-v2.json should have
priority over __attribute__ ((target("cpu=neoverse-n1"))).
Another point of contention here is -moverride itself, I suppose this should
override -muser-provided-CPU, which it doesn't in the current implementation.
I think then what makes sense to me in order of hierarchy is:
-moverride > -muser-provided-CPU > attributes/pragmas
What do you think?
Thanks,
Soumya
>
> Thanks,
> Tamar
>>
>> CCing Richard's personal email here (in case you would still like to give
>> feedback on this). Thank you for all your help with this feature, as well as
>> all
>> of my other work! :)
>>
>> Best,
>> Soumya
>>
>> Soumya AR (6):
>> aarch64 + arm: Remove const keyword from tune_params members and
>> nested members
>> aarch64: Enable dumping of AArch64 CPU tuning parameters to JSON
>> json: Add get_map() method to JSON object class
>> aarch64: Enable parsing of user-provided AArch64 CPU tuning parameters
>> aarch64: Regression tests for parsing of user-provided AArch64 CPU
>> tuning parameters
>> aarch64: Script to auto generate JSON tuning routines
>>
>> gcc/config.gcc | 2 +-
>> .../aarch64-generate-json-tuning-routines.py | 374 +++++++++++++
>> gcc/config/aarch64/aarch64-json-schema.h | 261 +++++++++
>> .../aarch64-json-tunings-parser-generated.inc | 355 ++++++++++++
>> .../aarch64/aarch64-json-tunings-parser.cc | 527 ++++++++++++++++++
>> .../aarch64/aarch64-json-tunings-parser.h | 29 +
>> ...aarch64-json-tunings-printer-generated.inc | 439 +++++++++++++++
>> .../aarch64/aarch64-json-tunings-printer.cc | 137 +++++
>> .../aarch64/aarch64-json-tunings-printer.h | 28 +
>> gcc/config/aarch64/aarch64-opts.h | 6 +-
>> gcc/config/aarch64/aarch64-protos.h | 169 +++---
>> gcc/config/aarch64/aarch64-tuning-enums.def | 37 ++
>> gcc/config/aarch64/aarch64.cc | 20 +
>> gcc/config/aarch64/aarch64.opt | 8 +
>> gcc/config/aarch64/t-aarch64 | 21 +
>> gcc/config/arm/aarch-common-protos.h | 128 ++---
>> gcc/json.h | 26 +-
>> gcc/selftest-run-tests.cc | 1 +
>> gcc/selftest.h | 1 +
>> .../aarch64-json-tunings.exp | 35 ++
>> .../aarch64/aarch64-json-tunings/boolean-1.c | 6 +
>> .../aarch64-json-tunings/boolean-1.json | 9 +
>> .../aarch64/aarch64-json-tunings/boolean-2.c | 7 +
>> .../aarch64-json-tunings/boolean-2.json | 9 +
>> .../aarch64-json-tunings/empty-brackets.c | 6 +
>> .../aarch64-json-tunings/empty-brackets.json | 1 +
>> .../aarch64/aarch64-json-tunings/empty.c | 6 +
>> .../aarch64/aarch64-json-tunings/empty.json | 0
>> .../aarch64/aarch64-json-tunings/enum-1.c | 8 +
>> .../aarch64/aarch64-json-tunings/enum-1.json | 7 +
>> .../aarch64/aarch64-json-tunings/enum-2.c | 7 +
>> .../aarch64/aarch64-json-tunings/enum-2.json | 7 +
>> .../aarch64/aarch64-json-tunings/integer-1.c | 7 +
>> .../aarch64-json-tunings/integer-1.json | 6 +
>> .../aarch64/aarch64-json-tunings/integer-2.c | 7 +
>> .../aarch64-json-tunings/integer-2.json | 6 +
>> .../aarch64/aarch64-json-tunings/integer-3.c | 7 +
>> .../aarch64-json-tunings/integer-3.json | 5 +
>> .../aarch64/aarch64-json-tunings/integer-4.c | 6 +
>> .../aarch64-json-tunings/integer-4.json | 5 +
>> .../aarch64/aarch64-json-tunings/string-1.c | 8 +
>> .../aarch64-json-tunings/string-1.json | 7 +
>> .../aarch64/aarch64-json-tunings/string-2.c | 7 +
>> .../aarch64-json-tunings/string-2.json | 5 +
>> .../aarch64/aarch64-json-tunings/test-all.c | 58 ++
>> .../aarch64-json-tunings/test-all.json | 39 ++
>> .../aarch64-json-tunings/unidentified-key.c | 6 +
>> .../unidentified-key.json | 5 +
>> 48 files changed, 2705 insertions(+), 156 deletions(-)
>> create mode 100644 gcc/config/aarch64/aarch64-generate-json-tuning-
>> routines.py
>> create mode 100644 gcc/config/aarch64/aarch64-json-schema.h
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-parser-
>> generated.inc
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-parser.cc
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-parser.h
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-printer-
>> generated.inc
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-printer.cc
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-printer.h
>> create mode 100644 gcc/config/aarch64/aarch64-tuning-enums.def
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/aarch64-json-tunings.exp
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/boolean-1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/boolean-1.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/boolean-2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/boolean-2.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/empty-brackets.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/empty-brackets.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/empty.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/empty.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/enum-1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/enum-1.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/enum-2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/enum-2.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-1.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-2.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-3.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-3.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-4.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-4.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/string-1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/string-1.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/string-2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/string-2.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/test-all.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/test-all.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/unidentified-key.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/unidentified-key.json
>>
>> --
>> 2.44.0