RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

xiezhiheng Fri, 09 Oct 2020 02:33:32 -0700

> -----Original Message-----
> From: Richard Sandiford [mailto:richard.sandif...@arm.com]
> Sent: Thursday, August 27, 2020 4:08 PM
> To: xiezhiheng <xiezhih...@huawei.com>
> Cc: Richard Biener <richard.guent...@gmail.com>; gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions
> emitted at -O3
> 
> xiezhiheng <xiezhih...@huawei.com> writes:
> > I made two separate patches for these two groups for review purposes.
> >
> > Note: Patch for min/max intrinsics should be applied before the patch for
> rounding intrinsics
> >
> > Bootstrapped and tested on aarch64 Linux platform.
> 
> Thanks, LGTM.  Pushed to master.
> 
> Richard


I made the patch for multiply and multiply accumulator intrinsics.

Note that bfmmlaq intrinsic is special because this instruction ignores the 
FPCR and does not update the FPSR exception status.
  
https://developer.arm.com/docs/ddi0596/h/simd-and-floating-point-instructions-alphabetic-order/bfmmla-bfloat16-floating-point-matrix-multiply-accumulate-into-2x2-matrix
So I set it to the AUTO_FP flag.

Bootstrapped and tested on aarch64 Linux platform.

Thanks,
Xie Zhiheng


diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 75b62b590e2..8ca9746189a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2020-10-09  Zhiheng Xie  <xiezhih...@huawei.com>
+           Nannan Zheng  <zhengnan...@huawei.com>
+
+       * config/aarch64/aarch64-simd-builtins.def: Add proper FLAG
+       for mul/mla/mls intrinsics.
+

pr94442-v1.patch
Description: pr94442-v1.patch

RE: [PATCH PR94442] [AArch64] Redundant ldp/stp instructions emitted at -O3

Reply via email to