From: Matthew Malcomson <mmalcom...@nvidia.com> Cc'ing in middle-end maintainers since I *think* that is the best group for the atomics machinery. Would appreciate a pointer if someone else would be better to Cc in.
Cc'ing in Joseph Myers since he's been very helpful w.r.t. floating point and libatomic so far. Dropping Jonathan Wakely from Cc because libstdc++ related things are now in so avoiding the unnecessary ping. Rebase & tweak of atomic fp fetch_{add,sub} patch series posted last year: https://gcc.gnu.org/pipermail/gcc-patches/2024-November/668754.html Changes from the last series are minor: 1) Rebased onto Prathamesh's automatic libatomic linking patch. Latest version of that is found at the link below (though this patch series was rebased onto the version one before this). https://gcc.gnu.org/pipermail/gcc-patches/2025-August/692287.html 2) Fixed some typos in comments. 3) Removed the "work without libatomic" flag that Joseph pointed out was unnecessary. Chose not to use include it solely for testing. 4) Made the documentation changes suggested in last review. - N.b. that patch has been approved, so not Cc'ing anyone on that patch event hough sending it upstream. 5) Added a patch for avoiding problems in x86_64 libstdc++. (Would appreciate extra attention on this patch -- it modifies a target hook in a backend that I'm not familiar with). On top of that the context around the patch has changed a bit, so cover letter adjusted below: This patchset introduces floating point versions of atomic fetch_add, fetch_sub, add_fetch and sub_fetch. Instructions for performing these operations have been directly available in GPU hardware for a while, and are now starting to get added to CPU ISA's with instructions like the AArch64 LDFADD. Clang has allowed floating point types to be used with these builtins for a while now https://reviews.llvm.org/D71726. Introducing these new overloads to this builtin allows users to directly specify the operation needed and hence allows the compiler to provide optimised output if possible. There is additional motivation to use such floating point type atomic operations in libstdc++ so that other compilers can use libstdc++ to generate optimal code for their own targets (e.g. NVC++ can use libstdc++ atomic<float>::fetch_add to generate optimal code for GPU's when using the `-stdpar` argument). Jonathan Wakely has already posted a patch introducing the use of these builtins into libstdc++ when they are available. We intend to post a patch using the new AArch64 instructions later in this release cycle. ------------------------------ As standard with the existing atomic builtins, we add the same functions in libatomic, allowing a fallback for when a given target has not implemented these operations directly in hardware. In order to use these functions we need to have a naming scheme that encodes the type -- we use a suffix of _fp to denote that this operation is on a floating point type, and a further empty suffix to denote a double, 'f' to denote a float, and similar. The scheme for the second part of the suffix taken from the existing builtins that have different versions for different floating point types -- e.g. __builtin_acosh, __builtin_acoshf, __builtin_acoshl, etc. In order to add floating point functions to libatomic we updated the makefile machinery to use names more descriptive of the new setup (where the SIZE of the datatype can no longer be used to distinguish all operations from each other). Moreover we add a CAS loop implementation in fop_n.c that handles floating point exception information and handles casting between floating point and integral types when switching between applying the operation and using CAS to attempt to store. ------------------------------ As Joseph Myers pointed out in response to my RFC, when performing floating point operations in a CAS loop there is the floating point exception information to take care of. In order to take care of this information I use the existing `atomic_assign_expand_fenv` target hook to generate code that checks this information. Partly due to the fact that this hook emits GENERIC code and partly due to the language-specific semantics of floating point exceptions, this means we now decide whether to emit a CAS loop handling the frontend (during overload resolution). The frontend decides to only use the underlying builtin if the backend has an optab defined that can implement it directly. ------------------------------ Now that the expansion to a CAS loop is performed in overloaded builtin resolution, this means that if the user were to directly use a resolved version (e.g. `__atomic_fetch_add_fp` for a double) that would not expand into a CAS loop inline. Instead (assuming the optab is not implemented for this target) it would pass through and end up using the libatomic fallback. This is not ideal, but I believe the complexity of adding another clause for this expansion to a CAS loop is not worth the benefit of handling a CAS loop expansion for this specific case (partly on the assumption that users would rarely specify the resolved version and partly on the belief that these resolved versions are not actually part of the user-facing interface -- since they're not documented in the manual and don't seem to be used enough for clang to expose the interface). I considered not exposing the resolved versions to the user (similar to the interface that _BitInt exposes) and instead handling them as an internal function that could expand to call the libatomic implementation. I chose not to do that for consistency with the rest of the atomic builtins. ------------------------------ There are a few places throughout the compiler that handle such atomic builtins and I have not updated to handle floating point atomic builtins. Places like asan, tsan, gimple-ssa-warn-access, analyzer, and tree-ssa-forwprop would need to be updated eventually. However since the current state of GCC is that no backend implements these optabs directly the generic version of the builtin is always expanded as a CAS loop in the frontend -- this means these mid-end passes will not see any of these builtins except in the case that the user explicitly calls the resolved version. I hoping to update these places in a later patch (the patch where we introduce the backend expansions). ------------------------------ Without adjustment, ix86_atomic_assign_expand_fenv generates code that gets broken during optimisation by `fold`. I believe the code returned by this function was incorrect (maybe only bad for C++?). The expression that gets incorrectly optimised is along the lines of: COMPOUND_EXPR<TARGET_EXPR<var1, some-init>, TARGET_EXPR<var2, expression-with-var1>> and `fold` (which gets called by `cp_fold`) removes the first TARGET_EXPR since it doesn't look like it has side effects (even though the variable it sets is used in the second expression). Adding `TREE_SIDE_EFFECTS` markers to this expression avoids the problem. ------------------------------ Testing done: Bootstrap and regression test passes on x86_64 and AArch64 (when run on top of the libatomic autoinclude patch that Prathamesh has posted). Cross compiler regression tests pass on arm-linux. Cross compiler regressino tests on AArch64 linux with Qemu emulating a machine that does not have LSE. Similarly tested with a dummy implementation of fetch_add as an optab in the AArch64 backend to ensure that codepath also works. ------------------------------ Matthew Malcomson (10): libatomic: Split concept of SUFFIX and SIZE in libatomic libatomic: Add floating point implementations of fetch_{add,sub} c: c++: Define new floating point builtin fetch_add functions builtins: Add FP types for atomic builtin overload resolution c: c++: Expand into CAS loop in frontend builtins: optab: Tie the new atomic builtins to the backend testsuite: Add tests for fp resolutions of __atomic_fetch_add doc: Mention floating point atomic fetch_add etc in docs [Not For Commit] Add demo implementation of one of the operations i386: Mark a tree node in i386.cc as TREE_SIDE_EFFECTS gcc/builtin-types.def | 20 + gcc/builtins.cc | 176 ++ gcc/builtins.h | 2 + gcc/c-family/c-common.cc | 217 ++- gcc/config/aarch64/aarch64.h | 2 + gcc/config/aarch64/aarch64.opt | 5 + gcc/config/aarch64/atomics.md | 15 + gcc/config/i386/i386.cc | 17 + gcc/doc/extend.texi | 9 + gcc/fortran/f95-lang.cc | 5 + gcc/fortran/types.def | 17 + gcc/optabs.cc | 19 + gcc/optabs.def | 6 +- gcc/sync-builtins.def | 40 + .../template/builtin-atomic-overloads.def | 28 +- .../template/builtin-atomic-overloads6.C | 23 +- .../template/builtin-atomic-overloads7.C | 16 +- gcc/testsuite/gcc.dg/atomic-op-fp-convert.c | 6 + gcc/testsuite/gcc.dg/atomic-op-fp-errs.c | 14 + .../gcc.dg/atomic-op-fp-resolve-complain.c | 5 + gcc/testsuite/gcc.dg/atomic-op-fp.c | 198 +++ gcc/testsuite/gcc.dg/atomic-op-fpf.c | 198 +++ gcc/testsuite/gcc.dg/atomic-op-fpf128.c | 201 +++ gcc/testsuite/gcc.dg/atomic-op-fpf16.c | 201 +++ gcc/testsuite/gcc.dg/atomic-op-fpf16b.c | 201 +++ gcc/testsuite/gcc.dg/atomic-op-fpf32.c | 201 +++ gcc/testsuite/gcc.dg/atomic-op-fpf32x.c | 201 +++ gcc/testsuite/gcc.dg/atomic-op-fpf64.c | 201 +++ gcc/testsuite/gcc.dg/atomic-op-fpf64x.c | 201 +++ gcc/testsuite/gcc.dg/atomic-op-fpl.c | 198 +++ .../gcc.dg/atomic/atomic-op-fp-fenv.c | 376 +++++ .../gcc.target/i386/excess-precision-13.c | 87 + gcc/testsuite/lib/target-supports.exp | 199 ++- libatomic/Makefile.am | 46 +- libatomic/Makefile.in | 49 +- libatomic/acinclude.m4 | 56 +- libatomic/auto-config.h.in | 114 +- libatomic/cas_n.c | 8 +- libatomic/config/linux/aarch64/host-config.h | 23 +- libatomic/config/linux/arm/host-config.h | 2 +- libatomic/config/s390/cas_n.c | 6 +- libatomic/config/s390/exch_n.c | 4 +- libatomic/config/s390/load_n.c | 4 +- libatomic/config/s390/store_n.c | 4 +- libatomic/config/x86/host-config.h | 14 +- libatomic/configure | 1485 ++++++++++++++++- libatomic/configure.ac | 6 + libatomic/exch_n.c | 12 +- libatomic/fadd_n.c | 23 +- libatomic/fop_n.c | 111 +- libatomic/fsub_n.c | 23 + libatomic/libatomic.map | 44 + libatomic/libatomic_i.h | 186 ++- libatomic/load_n.c | 12 +- libatomic/store_n.c | 12 +- libatomic/tas_n.c | 12 +- libatomic/testsuite/Makefile.in | 1 + .../testsuite/libatomic.c/atomic-op-fp-fenv.c | 421 +++++ .../testsuite/libatomic.c/atomic-op-fp.c | 219 +++ .../testsuite/libatomic.c/atomic-op-fpf.c | 219 +++ .../testsuite/libatomic.c/atomic-op-fpf128.c | 220 +++ .../testsuite/libatomic.c/atomic-op-fpf16.c | 223 +++ .../testsuite/libatomic.c/atomic-op-fpf16b.c | 220 +++ .../testsuite/libatomic.c/atomic-op-fpf32.c | 220 +++ .../testsuite/libatomic.c/atomic-op-fpf32x.c | 220 +++ .../testsuite/libatomic.c/atomic-op-fpf64.c | 220 +++ .../testsuite/libatomic.c/atomic-op-fpf64x.c | 220 +++ .../testsuite/libatomic.c/atomic-op-fpl.c | 219 +++ 68 files changed, 7943 insertions(+), 240 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp-convert.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp-errs.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp-resolve-complain.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fp.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf128.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf16b.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf32x.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpf64x.c create mode 100644 gcc/testsuite/gcc.dg/atomic-op-fpl.c create mode 100644 gcc/testsuite/gcc.dg/atomic/atomic-op-fp-fenv.c create mode 100644 gcc/testsuite/gcc.target/i386/excess-precision-13.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fp-fenv.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fp.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf128.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf16b.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf32x.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpf64x.c create mode 100644 libatomic/testsuite/libatomic.c/atomic-op-fpl.c -- 2.43.0