> On 29 Sep 2025, at 5:20 PM, Tamar Christina <[email protected]> wrote:
> 
> External email: Use caution opening links or attachments
> 
> 
>> -----Original Message-----
>> From: [email protected] <[email protected]>
>> Sent: 25 August 2025 12:00
>> To: [email protected]
>> Cc: [email protected]; [email protected]; [email protected]
>> Subject: [v3 PATCH 0/6] aarch64: Support for user-defined aarch64 tuning
>> 
>> From: Soumya AR <[email protected]>
>> 
>> Hi,
>> 
>> This is a follow-up to suggestions given on v2 of the aarch64
>> user-defined tunings.
>> 
>> https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689893.html
>> Attaching an updated patch series here.
>> 
>> Regarding some questions on this thread:
>> 
>> I've made the changes suggested by Andrew and Richard. Thanks a lot for
>> taking
>> a look!
>> 
>>> Maybe you could do:
>>> ```
>>> template<typename T>
>>> using serialize_func_type = std::unique_ptr<json::object>
>>> (*serialize_func) (const typename std::remove_pointer<T>::type &);
>>> 
>>> serialize_object_helper (
>>> const T &member, serialize_func_type<T>  serialize_func)
>>> ```
>>> Since std::remove_pointer<T>::type will T if T is not a pointer.
>> 
>> Right, this is much better. I've updated both serialize_object_helper and
>> parse_object_helper to do this.
>> 
>>>> +    function_map = {}
>>>> +    for path_key, (path, obj_schema) in all_objects_with_paths.items():
>>>> +        filtered_path = [p for p in path if p != "tune_params"]
>>>> +        if filtered_path:
>>>> +            full_name = "_".join(filtered_path)
>>>> +            function_map[path_key] = f"{operation}_{full_name}"
>>>> +        else:
>>>> +            function_map[path_key] = f"{operation}_{path_key}"
>>> 
>>> Are the 6 lines above equivalent to:
>>> 
>>>   if len(path) > 1:
>>>       full_name = "_".join(path[1:])
>>>       function_map[path_key] = f"{operation}_{full_name}"
>>>   else:
>>>       function_map[path_key] = f"{operation}_{path_key}"
>>> 
>>> ?  If so, that might be easier to follow, since it emphasises that
>>> we're structurally ignoring path[0] for all cases.  It looked at first
>>> like the filtering might remove something other than the first element.
>> 
>> I actually had this check to avoid the case for when we encounter the
>> "tune_params" key. On a second look, I realized I was taking care of that 
>> case
>> earlier anyway, so the check isn't needed at all. Now it's just:
>> 
>>    for path_key, (path, obj_schema) in all_objects_with_paths.items():
>>        if path:
>>            full_name = "_".join(path)
>>            function_map[path_key] = f"{operation}_{full_name}"
>>        else:
>>            function_map[path_key] = f"{operation}_{path_key}"
>> 
>> 
>>> How is memory management handled (both here and for pointers to
>>> structures)?  Do we just allocate and never free?  If so, that's ok for
>>> command-line options, but I wonder whether we'll ewventually be using
>>> this on a per-function basis.  (Hopefully not though...)
>> 
>> About this, I addressed it earlier in a separate reply:
>> https://gcc.gnu.org/pipermail/gcc-patches/2025-July/689902.html
>> 
>> Pasting the relevant information from that thread.
>> 
>> Possible solutions:
>> 1. Simply don't bother freeing up the memory and let the memory leak be :)
>> 2. Use the GCC garbage collector
>> 3. Remove the pointers and have all the nested structures be embedded since
>> I'm not too sure why this is inconsistent in tune_params.
>> 4. (The current approach) Have each temporary structure be a static local
>> variable. This works, but operates under the assumption that nested
>> structures
>> in tune_params will always have unique types, since the variable declaration 
>> is
>> a template that is resolved by the type of the nested structures. There is an
>> extra check added to ensure we don't overwrite accidentally, in the case we
>> ever end up having structure members of the same type. This is still a bit 
>> ugly
>> though, so I would appreciate if there are other suggestions on how to handle
>> this.
>> 
>>> I agree with Andrew that it would be nice in principle to be able
>>> to autogenerate this.  I'm personally happy without that while it
>>> remains this small, but I think it would at least be worth sharing
>>> this between the parsing and serialising code, rather than having
>>> separate lists for each.
>> 
>> Auto-generating this would not be a problem, but my main agenda with the
>> script
>> was to have a setup that was only dependent on the JSON schema and not
>> have any
>> actual references to the tuning parameters. Essentially, make the script
>> as agnostic of the tuning parameters as possible, so anyone modifying the
>> tuning
>> paramters does not need to touch the script (and only modify the schema).
>> 
>> Extending that philosophy to the enums, I now have the following setup:
>> 
>> I've centralized the tuning enum definitions into a new aarch64-tuning-
>> enums.def
>> file. The enums (autoprefetch_model and ldp_stp_policy) are now defined
>> once
>> using macros and consumed by both the core definitions (aarch64-opts.h,
>> aarch64-protos.h) and the generated parser/printer code. This ensures that if
>> someone wants to add a new enum value, they only need to make
>> modifications in
>> the .def file. The codegen from the script refers to the same enums.
>> 
>> If this is excessive, I can just have the script directly generate the enum 
>> to
>> string maps. Let me know what you think.
>> 
>>> Could you explain the reason for Negative(fdump-tuning-model=) and
>> ToLower?
>> 
>> ToLower is incorrect here, I've removed it. We should accept any filename for
>> the user provided files. Thanks for pointing this out.
>> 
>> As I understand it, Negative() will prune all earlier instances of the 
>> option,
>> yes? I suppose in any case, even if I don't specify Negative(), then the 
>> second
>> fdump-tuning-model= would override the first one? At that point, it seems
>> better
>> to not expose the earlier option at all..?
>> 
>> Similar logic for Negative(muser-provided-CPU=), where we would only like to
>> parse one file.
>> 
>>> The option should be documented in invoke.texi, or marked Undocumented
>> if
>>> we really don't want to put it in the manual.
>> 
>> Not too sure about this one. I've marked it Undocumented for now but I can
>> see
>> intances of options that are marked Undocumented but still specified in
>> invoke.texi. Should we document its usage anyway in invoke.texi?
>> Also, should we have something similar to -moverride, where we specify that
>> this
>> feature is only for power users?
> 
> Hi Soumya,
> 
> I'm still catching up on the series. So I'll come back to the questions above
> once I finish going through all the patches.
> 
> But Yeah I think we want something like this to indicate it's for users whom
> really know what they're doing.
> 
> Now to that end as I'm going through the series I had a question.
> 
> What's the interaction model here between this option and push/pop
> pragmas or attributes that set a tuning/mcpu.
> 
> At the moment the flag is handled in override_options which means
> It will override all cost models.
> 
> e.g.
> 
> __attribute__ ((target("cpu=neoverse-n1")))
> void foo ()
> {}
> 
> Would be overridden.  Is this the intention, or did we only want to
> override the default/global cost model with this option?

Thanks for bringing this up, I hadn't thought about target attributes when I did
implement it. As I understand, if override_options is given hierarchy over push/
pop pragmas or attributes, then it's fair to say that we want CLI to have
priority?

Then, in a similar vein, -muser-provided-CPU=neoverse-v2.json should have
priority over __attribute__ ((target("cpu=neoverse-n1"))).

Another point of contention here is -moverride itself, I suppose this should
override -muser-provided-CPU, which it doesn't in the current implementation. 

I think then what makes sense to me in order of hierarchy is:

-moverride > -muser-provided-CPU > attributes/pragmas

What do you think?

Thanks,
Soumya

> 
> Thanks,
> Tamar
>> 
>> CCing Richard's personal email here (in case you would still like to give
>> feedback on this). Thank you for all your help with this feature, as well as 
>> all
>> of my other work! :)
>> 
>> Best,
>> Soumya
>> 
>> Soumya AR (6):
>>  aarch64 + arm: Remove const keyword from tune_params members and
>>    nested members
>>  aarch64: Enable dumping of AArch64 CPU tuning parameters to JSON
>>  json: Add get_map() method to JSON object class
>>  aarch64: Enable parsing of user-provided AArch64 CPU tuning parameters
>>  aarch64: Regression tests for parsing of user-provided AArch64 CPU
>>    tuning parameters
>>  aarch64: Script to auto generate JSON tuning routines
>> 
>> gcc/config.gcc                                |   2 +-
>> .../aarch64-generate-json-tuning-routines.py  | 374 +++++++++++++
>> gcc/config/aarch64/aarch64-json-schema.h      | 261 +++++++++
>> .../aarch64-json-tunings-parser-generated.inc | 355 ++++++++++++
>> .../aarch64/aarch64-json-tunings-parser.cc    | 527 ++++++++++++++++++
>> .../aarch64/aarch64-json-tunings-parser.h     |  29 +
>> ...aarch64-json-tunings-printer-generated.inc | 439 +++++++++++++++
>> .../aarch64/aarch64-json-tunings-printer.cc   | 137 +++++
>> .../aarch64/aarch64-json-tunings-printer.h    |  28 +
>> gcc/config/aarch64/aarch64-opts.h             |   6 +-
>> gcc/config/aarch64/aarch64-protos.h           | 169 +++---
>> gcc/config/aarch64/aarch64-tuning-enums.def   |  37 ++
>> gcc/config/aarch64/aarch64.cc                 |  20 +
>> gcc/config/aarch64/aarch64.opt                |   8 +
>> gcc/config/aarch64/t-aarch64                  |  21 +
>> gcc/config/arm/aarch-common-protos.h          | 128 ++---
>> gcc/json.h                                    |  26 +-
>> gcc/selftest-run-tests.cc                     |   1 +
>> gcc/selftest.h                                |   1 +
>> .../aarch64-json-tunings.exp                  |  35 ++
>> .../aarch64/aarch64-json-tunings/boolean-1.c  |   6 +
>> .../aarch64-json-tunings/boolean-1.json       |   9 +
>> .../aarch64/aarch64-json-tunings/boolean-2.c  |   7 +
>> .../aarch64-json-tunings/boolean-2.json       |   9 +
>> .../aarch64-json-tunings/empty-brackets.c     |   6 +
>> .../aarch64-json-tunings/empty-brackets.json  |   1 +
>> .../aarch64/aarch64-json-tunings/empty.c      |   6 +
>> .../aarch64/aarch64-json-tunings/empty.json   |   0
>> .../aarch64/aarch64-json-tunings/enum-1.c     |   8 +
>> .../aarch64/aarch64-json-tunings/enum-1.json  |   7 +
>> .../aarch64/aarch64-json-tunings/enum-2.c     |   7 +
>> .../aarch64/aarch64-json-tunings/enum-2.json  |   7 +
>> .../aarch64/aarch64-json-tunings/integer-1.c  |   7 +
>> .../aarch64-json-tunings/integer-1.json       |   6 +
>> .../aarch64/aarch64-json-tunings/integer-2.c  |   7 +
>> .../aarch64-json-tunings/integer-2.json       |   6 +
>> .../aarch64/aarch64-json-tunings/integer-3.c  |   7 +
>> .../aarch64-json-tunings/integer-3.json       |   5 +
>> .../aarch64/aarch64-json-tunings/integer-4.c  |   6 +
>> .../aarch64-json-tunings/integer-4.json       |   5 +
>> .../aarch64/aarch64-json-tunings/string-1.c   |   8 +
>> .../aarch64-json-tunings/string-1.json        |   7 +
>> .../aarch64/aarch64-json-tunings/string-2.c   |   7 +
>> .../aarch64-json-tunings/string-2.json        |   5 +
>> .../aarch64/aarch64-json-tunings/test-all.c   |  58 ++
>> .../aarch64-json-tunings/test-all.json        |  39 ++
>> .../aarch64-json-tunings/unidentified-key.c   |   6 +
>> .../unidentified-key.json                     |   5 +
>> 48 files changed, 2705 insertions(+), 156 deletions(-)
>> create mode 100644 gcc/config/aarch64/aarch64-generate-json-tuning-
>> routines.py
>> create mode 100644 gcc/config/aarch64/aarch64-json-schema.h
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-parser-
>> generated.inc
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-parser.cc
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-parser.h
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-printer-
>> generated.inc
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-printer.cc
>> create mode 100644 gcc/config/aarch64/aarch64-json-tunings-printer.h
>> create mode 100644 gcc/config/aarch64/aarch64-tuning-enums.def
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/aarch64-json-tunings.exp
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/boolean-1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/boolean-1.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/boolean-2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/boolean-2.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/empty-brackets.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/empty-brackets.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/empty.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/empty.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/enum-1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/enum-1.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/enum-2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/enum-2.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-1.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-2.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-3.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-3.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-4.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/integer-4.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/string-1.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/string-1.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/string-2.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/string-2.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/test-all.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/test-all.json
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/unidentified-key.c
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/aarch64-json-
>> tunings/unidentified-key.json
>> 
>> --
>> 2.44.0


Reply via email to