[PATCH] D96231: [X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.

Pengfei Wang via Phabricator via cfe-commits Tue, 09 Feb 2021 04:34:18 -0800

pengfei added inline comments.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13829
         CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType());
+    Builder.getFastMathFlags().setAllowReassoc(true);
     return Builder.CreateCall(F, {Ops[0], Ops[1]});
----------------
spatel wrote:
> spatel wrote:
> > I haven't looked at this part of the compiler in a long time, so I was 
> > wondering how we handle FMF scope. It looks like there is already an 
> > FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting 
> > FMF here will not affect anything but this CreateCall(). 
> > 
> > Does that match your understanding? Should we have an extra regression test 
> > to make sure that does not change?
> > 
> > I am imagining something like:
> > 
> > ```
> > double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) {
> >   double S = _mm512_reduce_add_pd(__W) + ExtraAddOp;
> >   return S;
> > }
> > 
> > ```
> > 
> > Then we could confirm that `reassoc` is not applied to the `fadd` that 
> > follows the reduction call.
> Currently (and we could say that this is an LLVM codegen bug), we will not 
> generate the optimal/expected reduction with `reassoc` alone.
> 
> I think the x86 reduction definition is implicitly assuming that -0.0 is not 
> meaningful here, so we should add `nsz` too.
> 
> The backend is expecting an explicit `nsz` on this op. Ie, I see this x86 asm 
> currently with only `reassoc`:
>       
> ```
>       vextractf64x4   $1, %zmm0, %ymm1
>       vaddpd  %zmm1, %zmm0, %zmm0
>       vextractf128    $1, %ymm0, %xmm1
>       vaddpd  %xmm1, %xmm0, %xmm0
>       vpermilpd       $1, %xmm0, %xmm1      
>       vaddsd  %xmm1, %xmm0, %xmm0 
>       vxorpd  %xmm1, %xmm1, %xmm1   <--- create 0.0
>       vaddsd  %xmm1, %xmm0, %xmm0   <--- add it to the reduction result
> ```
> 
> Alternatively (and I'm not sure where it is specified), we could replace the 
> default 0.0 argument with -0.0?
Confirmed by new tests.


================
Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13829
         CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType());
+    Builder.getFastMathFlags().setAllowReassoc(true);
     return Builder.CreateCall(F, {Ops[0], Ops[1]});
----------------
pengfei wrote:
> spatel wrote:
> > spatel wrote:
> > > I haven't looked at this part of the compiler in a long time, so I was 
> > > wondering how we handle FMF scope. It looks like there is already an 
> > > FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So 
> > > setting FMF here will not affect anything but this CreateCall(). 
> > > 
> > > Does that match your understanding? Should we have an extra regression 
> > > test to make sure that does not change?
> > > 
> > > I am imagining something like:
> > > 
> > > ```
> > > double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) {
> > >   double S = _mm512_reduce_add_pd(__W) + ExtraAddOp;
> > >   return S;
> > > }
> > > 
> > > ```
> > > 
> > > Then we could confirm that `reassoc` is not applied to the `fadd` that 
> > > follows the reduction call.
> > Currently (and we could say that this is an LLVM codegen bug), we will not 
> > generate the optimal/expected reduction with `reassoc` alone.
> > 
> > I think the x86 reduction definition is implicitly assuming that -0.0 is 
> > not meaningful here, so we should add `nsz` too.
> > 
> > The backend is expecting an explicit `nsz` on this op. Ie, I see this x86 
> > asm currently with only `reassoc`:
> >     
> > ```
> >     vextractf64x4   $1, %zmm0, %ymm1
> >     vaddpd  %zmm1, %zmm0, %zmm0
> >     vextractf128    $1, %ymm0, %xmm1
> >     vaddpd  %xmm1, %xmm0, %xmm0
> >     vpermilpd       $1, %xmm0, %xmm1      
> >     vaddsd  %xmm1, %xmm0, %xmm0 
> >     vxorpd  %xmm1, %xmm1, %xmm1   <--- create 0.0
> >     vaddsd  %xmm1, %xmm0, %xmm0   <--- add it to the reduction result
> > ```
> > 
> > Alternatively (and I'm not sure where it is specified), we could replace 
> > the default 0.0 argument with -0.0?
> Confirmed by new tests.
I think there's no such assumption for fadd/fmul instructions. We do have it 
for fmin/fmax. So I think we don't need to add nsz here.


================
Comment at: clang/lib/Headers/avx512fintrin.h:9304
+ * For floating points type, we always assume the elements are reassociable 
even
+ * if -fast-math is off.
+
----------------
spatel wrote:
> Also mention that sign of zero is indeterminate. We might use the LangRef 
> text as a model for what to say here:
> https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fadd-intrinsic
Got it. Thanks!


================
Comment at: clang/lib/Headers/avx512fintrin.h:9352
 static __inline__ double __DEFAULT_FN_ATTRS512 _mm512_reduce_add_pd(__m512d 
__W) {
   return __builtin_ia32_reduce_fadd_pd512(0.0, __W);
 }
----------------
spatel wrote:
> Ah - this is where the +0.0 is specified. This should be -0.0. We could still 
> add 'nsz' flag to be safe.
-0.0 can fix the problem. But we don't need to add 'nsz'. We can add it if we 
can find a corner case.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D96231/new/

https://reviews.llvm.org/D96231

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D96231: [X86] Always assign reassoc flag for intrinsics *reduce_add/mul_ps/pd.

Reply via email to