> On 23 Apr 2025, at 13:47, Richard Sandiford <richard.sandif...@arm.com> wrote: > > Thanks for all the feedback. I've tried to address it in the version > below. I'll push later today if there are no further comments. > > Richard > > > The list is structured as: > > - new configurations > - command-line changes > - ACLE changes > - everything else > > As usual, the list of new architectures, CPUs, and features is from a > purely mechanical trawl of the associated .def files. I've identified > features by their architectural name to try to improve searchability. > Similarly, the list of ACLE changes includes the associated ACLE > feature macros, again to try to improve searchability. > > The list summarises some of the target-specific optimisations because > it sounded like Tamar had received feedback that people found such > information interesting. > > I've used the passive tense for most entries, to try to follow the > style used elsewhere. > > We don't yet define __ARM_FEATURE_FAMINMAX, but I'll fix that > separately.
Thanks again for doing this... > > + <li>Support has been added for the following features of the Arm C > + Language Extensions > + (<a href="https://github.com/ARM-software/acle">ACLE</a>): > + <ul> > + <li>guarded control stacks</li> > + <li>lookup table instructions with 2-bit and 4-bit indices > + (predefined macro > + <code>__ARM_FEATURE_LUT</code>, enabled by <code>+lut</code>) > + </li> > + <li>floating-point absolute minimum and maximum instructions > + (predefined macro <code>__ARM_FEATURE_FAMINMAX</code>, > + enabled by <code>+faminmax</code>) > + </li> > + <li>FP8 conversions (predefined macro > + <code>__ARM_FEATURE_FP8</code>, enabled by <code>+fp8</code>) > + </li> > + <li>FP8 2-way dot product to half precision instructions > + (predefined macro <code>__ARM_FEATURE_FP8DOT2</code>, > + enabled by <code>+fp8dot2</code>) > + </li> > + <li>FP8 4-way dot product to single precision instructions > + (predefined macro <code>__ARM_FEATURE_FP8DOT4</code>, > + enabled by <code>+fp8dot4</code>) > + </li> > + <li>FP8 multiply-accumulate to half precision and single precision > + instructions (predefined macro <code>__ARM_FEATURE_FP8FMA</code>, > + enabled by <code>+fp8fma</code>) > + </li> > + <li>SVE FP8 2-way dot product to half precision instructions > + (predefined macro <code>__ARM_FEATURE_SSVE_FP8DOT2</code>, > + enabled by <code>+ssve-fp8dot2</code>) > + </li> > + <li>SVE FP8 4-way dot product to single precision instructions > + (predefined macro <code>__ARM_FEATURE_SSVE_FP8DOT4</code>, > + enabled by <code>+ssve-fp8dot4</code>) > + </li> > + <li>SVE FP8 multiply-accumulate to half precision and single precision > + instructions (predefined macro > <code>__ARM_FEATURE_SSVE_FP8FMA</code>, > + enabled by <code>+ssve-fp8fma</code>) > + </li> … Should these say “SSVE FP8” rather than “SVE FP8”? Thanks, Kyrill > + <li>SVE2.1 instructions (predefined macro > + <code>__ARM_FEATURE_SVE2p1</code>, enabled by <code>+sve2p1</code>) > + </li> > + <li>SVE non-widening bfloat16 instructions > + (predefined macro <code>__ARM_FEATURE_SVE_B16B16</code>, > + enabled by <code>+sve-b16b16</code>) > + </li> > + <li>SME2.1 instructions (predefined macro > + <code>__ARM_FEATURE_SME2p1</code>, enabled by <code>+sme2p1</code>) > + </li> > + <li>SME non-widening bfloat16 instructions > + (predefined macro <code>__ARM_FEATURE_SME_B16B16</code>, > + enabled by <code>+sme-b16b16</code>) > + </li> > + <li>SME half-precision instructions > + (predefined macro <code>__ARM_FEATURE_SME_F16F16</code>, > + enabled by <code>+sme-f16f16</code>) > + </li> > + <li>using C and C++ prefix operators, infix operators, and postfix > + operators with scalable SVE ACLE types > + (predefined macro <code>__ARM_FEATURE_SVE_VECTOR_OPERATORS==2</code>, > + enabled by <code>+sve</code>) > + </li> > + <li><code>__fma</code> (in <code>arm_acle.h</code>)</li> > + <li><code>__fmaf</code> (in <code>arm_acle.h</code>)</li> > + <li><code>__chkfeat</code> (in <code>arm_acle.h</code>)</li> > + </ul> > + </li> > + <li>In addition, the following changes have been made to preexisting > + ACLE features: > + <ul> > + <li>The macros <code>__ARM_FEATURE_BF16</code> and > + <code>__ARM_FEATURE_SVE_BF16</code> are now predefined when the > + associated support is available. Previous versions of GCC provided > + the associated intrinsics but did not predefine the macros. > + </li> > + <li>OpenMP tasks can now share scalable SVE vectors and predicates. > + However, offloading of scalable vectors and predicates is not > + supported. > + </li> > + <li>ACLE system register functions (such as <code>__arm_rsr</code> > + and <code>__arm_wsr</code>) no longer try to enforce the minimum > + architectural requirement. > + </li> > + <li>A warning is reported if code attempts to use the Function > + Multi-Versioning feature. GCC's current implementation of this > + feature is still experimental and it does not conform to the > + ACLE specification. > + </li> > + </ul> > + </li> > + <li>Support has been added for the <code>indirect_return</code> > + function-type attribute, which indicates that a function might return > + via an indirect branch instead of via a normal return instruction. > + </li> > + <li>128-bit atomic operations have been extended to make use of > + FEAT_LRCPC3 instructions, when support for the instructions is > + detected at runtime. > + </li> > + <li>There have been many code-generation improvements to the AArch64 port. > + Some examples are: > + <ul> > + <li>automatic use of AArch64 CRC instructions</li> > + <li>automatic use of AArch64 saturating vector arithmetic > + instructions > + </li> > + <li>better code generation of population counts</li> > + <li>improved handling of floating-point and vector immediates</li> > + <li>improved handling of vector permutations</li> > + <li>more use of SVE instructions to optimize Advanced SIMD code</li> > + <li>more folding and simplification of SVE ACLE intrinsics</li> > + <li>improved CPU-specific tuning</li> > + <li>improved register allocation, such as eliminating some > + vector moves > + </li> > + </ul> > + </li> > +</ul> > > <h3 id="amdgcn">AMD GPU (GCN)</h3> > > -- > 2.43.0 >