> On 23 Apr 2025, at 13:47, Richard Sandiford <richard.sandif...@arm.com> wrote:
> 
> Thanks for all the feedback.  I've tried to address it in the version
> below.  I'll push later today if there are no further comments.
> 
> Richard
> 
> 
> The list is structured as:
> 
> - new configurations
> - command-line changes
> - ACLE changes
> - everything else
> 
> As usual, the list of new architectures, CPUs, and features is from a
> purely mechanical trawl of the associated .def files.  I've identified
> features by their architectural name to try to improve searchability.
> Similarly, the list of ACLE changes includes the associated ACLE
> feature macros, again to try to improve searchability.
> 
> The list summarises some of the target-specific optimisations because
> it sounded like Tamar had received feedback that people found such
> information interesting.
> 
> I've used the passive tense for most entries, to try to follow the
> style used elsewhere.
> 
> We don't yet define __ARM_FEATURE_FAMINMAX, but I'll fix that
> separately.

Thanks again for doing this...

> 
> +  <li>Support has been added for the following features of the Arm C
> +    Language Extensions
> +    (<a href="https://github.com/ARM-software/acle";>ACLE</a>):
> +    <ul>
> +      <li>guarded control stacks</li>
> +      <li>lookup table instructions with 2-bit and 4-bit indices
> +        (predefined macro
> +        <code>__ARM_FEATURE_LUT</code>, enabled by <code>+lut</code>)
> +      </li>
> +      <li>floating-point absolute minimum and maximum instructions
> +        (predefined macro <code>__ARM_FEATURE_FAMINMAX</code>,
> +        enabled by <code>+faminmax</code>)
> +      </li>
> +      <li>FP8 conversions (predefined macro
> +        <code>__ARM_FEATURE_FP8</code>, enabled by <code>+fp8</code>)
> +      </li>
> +      <li>FP8 2-way dot product to half precision instructions
> +        (predefined macro <code>__ARM_FEATURE_FP8DOT2</code>,
> +        enabled by <code>+fp8dot2</code>)
> +      </li>
> +      <li>FP8 4-way dot product to single precision instructions
> +        (predefined macro <code>__ARM_FEATURE_FP8DOT4</code>,
> +        enabled by <code>+fp8dot4</code>)
> +      </li>
> +      <li>FP8 multiply-accumulate to half precision and single precision
> +        instructions (predefined macro <code>__ARM_FEATURE_FP8FMA</code>,
> +        enabled by <code>+fp8fma</code>)
> +      </li>
> +      <li>SVE FP8 2-way dot product to half precision instructions
> +        (predefined macro <code>__ARM_FEATURE_SSVE_FP8DOT2</code>,
> +        enabled by <code>+ssve-fp8dot2</code>)
> +      </li>
> +      <li>SVE FP8 4-way dot product to single precision instructions
> +        (predefined macro <code>__ARM_FEATURE_SSVE_FP8DOT4</code>,
> +        enabled by <code>+ssve-fp8dot4</code>)
> +      </li>
> +      <li>SVE FP8 multiply-accumulate to half precision and single precision
> +        instructions (predefined macro 
> <code>__ARM_FEATURE_SSVE_FP8FMA</code>,
> +        enabled by <code>+ssve-fp8fma</code>)
> +      </li>

… Should these say “SSVE FP8” rather than “SVE FP8”?
Thanks,
Kyrill

> +      <li>SVE2.1 instructions (predefined macro
> +        <code>__ARM_FEATURE_SVE2p1</code>, enabled by <code>+sve2p1</code>)
> +      </li>
> +      <li>SVE non-widening bfloat16 instructions
> +        (predefined macro <code>__ARM_FEATURE_SVE_B16B16</code>,
> +        enabled by <code>+sve-b16b16</code>)
> +      </li>
> +      <li>SME2.1 instructions (predefined macro
> +        <code>__ARM_FEATURE_SME2p1</code>, enabled by <code>+sme2p1</code>)
> +      </li>
> +      <li>SME non-widening bfloat16 instructions
> +        (predefined macro <code>__ARM_FEATURE_SME_B16B16</code>,
> +        enabled by <code>+sme-b16b16</code>)
> +      </li>
> +      <li>SME half-precision instructions
> +        (predefined macro <code>__ARM_FEATURE_SME_F16F16</code>,
> +        enabled by <code>+sme-f16f16</code>)
> +      </li>
> +      <li>using C and C++ prefix operators, infix operators, and postfix
> +        operators with scalable SVE ACLE types
> +        (predefined macro <code>__ARM_FEATURE_SVE_VECTOR_OPERATORS==2</code>,
> +        enabled by <code>+sve</code>)
> +      </li>
> +      <li><code>__fma</code> (in <code>arm_acle.h</code>)</li>
> +      <li><code>__fmaf</code> (in <code>arm_acle.h</code>)</li>
> +      <li><code>__chkfeat</code> (in <code>arm_acle.h</code>)</li>
> +    </ul>
> +  </li>
> +  <li>In addition, the following changes have been made to preexisting
> +    ACLE features:
> +    <ul>
> +      <li>The macros <code>__ARM_FEATURE_BF16</code> and
> +        <code>__ARM_FEATURE_SVE_BF16</code> are now predefined when the
> +        associated support is available.  Previous versions of GCC provided
> +        the associated intrinsics but did not predefine the macros.
> +      </li>
> +      <li>OpenMP tasks can now share scalable SVE vectors and predicates.
> +        However, offloading of scalable vectors and predicates is not
> +        supported.
> +      </li>
> +      <li>ACLE system register functions (such as <code>__arm_rsr</code>
> +        and <code>__arm_wsr</code>) no longer try to enforce the minimum
> +        architectural requirement.
> +      </li>
> +      <li>A warning is reported if code attempts to use the Function
> +        Multi-Versioning feature.  GCC's current implementation of this
> +        feature is still experimental and it does not conform to the
> +        ACLE specification.
> +      </li>
> +    </ul>
> +  </li>
> +  <li>Support has been added for the <code>indirect_return</code>
> +    function-type attribute, which indicates that a function might return
> +    via an indirect branch instead of via a normal return instruction.
> +  </li>
> +  <li>128-bit atomic operations have been extended to make use of
> +    FEAT_LRCPC3 instructions, when support for the instructions is
> +    detected at runtime.
> +  </li>
> +  <li>There have been many code-generation improvements to the AArch64 port.
> +    Some examples are:
> +    <ul>
> +      <li>automatic use of AArch64 CRC instructions</li>
> +      <li>automatic use of AArch64 saturating vector arithmetic
> +        instructions
> +      </li>
> +      <li>better code generation of population counts</li>
> +      <li>improved handling of floating-point and vector immediates</li>
> +      <li>improved handling of vector permutations</li>
> +      <li>more use of SVE instructions to optimize Advanced SIMD code</li>
> +      <li>more folding and simplification of SVE ACLE intrinsics</li>
> +      <li>improved CPU-specific tuning</li>
> +      <li>improved register allocation, such as eliminating some
> +        vector moves
> +      </li>
> +    </ul>
> +  </li>
> +</ul>
> 
> <h3 id="amdgcn">AMD GPU (GCN)</h3>
> 
> -- 
> 2.43.0
> 

Reply via email to