pengfei added inline comments.
================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13829 CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType()); + Builder.getFastMathFlags().setAllowReassoc(true); return Builder.CreateCall(F, {Ops[0], Ops[1]}); ---------------- spatel wrote: > spatel wrote: > > I haven't looked at this part of the compiler in a long time, so I was > > wondering how we handle FMF scope. It looks like there is already an > > FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So setting > > FMF here will not affect anything but this CreateCall(). > > > > Does that match your understanding? Should we have an extra regression test > > to make sure that does not change? > > > > I am imagining something like: > > > > ``` > > double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) { > > double S = _mm512_reduce_add_pd(__W) + ExtraAddOp; > > return S; > > } > > > > ``` > > > > Then we could confirm that `reassoc` is not applied to the `fadd` that > > follows the reduction call. > Currently (and we could say that this is an LLVM codegen bug), we will not > generate the optimal/expected reduction with `reassoc` alone. > > I think the x86 reduction definition is implicitly assuming that -0.0 is not > meaningful here, so we should add `nsz` too. > > The backend is expecting an explicit `nsz` on this op. Ie, I see this x86 asm > currently with only `reassoc`: > > ``` > vextractf64x4 $1, %zmm0, %ymm1 > vaddpd %zmm1, %zmm0, %zmm0 > vextractf128 $1, %ymm0, %xmm1 > vaddpd %xmm1, %xmm0, %xmm0 > vpermilpd $1, %xmm0, %xmm1 > vaddsd %xmm1, %xmm0, %xmm0 > vxorpd %xmm1, %xmm1, %xmm1 <--- create 0.0 > vaddsd %xmm1, %xmm0, %xmm0 <--- add it to the reduction result > ``` > > Alternatively (and I'm not sure where it is specified), we could replace the > default 0.0 argument with -0.0? Confirmed by new tests. ================ Comment at: clang/lib/CodeGen/CGBuiltin.cpp:13829 CGM.getIntrinsic(Intrinsic::vector_reduce_fadd, Ops[1]->getType()); + Builder.getFastMathFlags().setAllowReassoc(true); return Builder.CreateCall(F, {Ops[0], Ops[1]}); ---------------- pengfei wrote: > spatel wrote: > > spatel wrote: > > > I haven't looked at this part of the compiler in a long time, so I was > > > wondering how we handle FMF scope. It looks like there is already an > > > FMFGuard object in place -- CodeGenFunction::CGFPOptionsRAII(). So > > > setting FMF here will not affect anything but this CreateCall(). > > > > > > Does that match your understanding? Should we have an extra regression > > > test to make sure that does not change? > > > > > > I am imagining something like: > > > > > > ``` > > > double test_mm512_reduce_add_pd(__m512d __W, double ExtraAddOp) { > > > double S = _mm512_reduce_add_pd(__W) + ExtraAddOp; > > > return S; > > > } > > > > > > ``` > > > > > > Then we could confirm that `reassoc` is not applied to the `fadd` that > > > follows the reduction call. > > Currently (and we could say that this is an LLVM codegen bug), we will not > > generate the optimal/expected reduction with `reassoc` alone. > > > > I think the x86 reduction definition is implicitly assuming that -0.0 is > > not meaningful here, so we should add `nsz` too. > > > > The backend is expecting an explicit `nsz` on this op. Ie, I see this x86 > > asm currently with only `reassoc`: > > > > ``` > > vextractf64x4 $1, %zmm0, %ymm1 > > vaddpd %zmm1, %zmm0, %zmm0 > > vextractf128 $1, %ymm0, %xmm1 > > vaddpd %xmm1, %xmm0, %xmm0 > > vpermilpd $1, %xmm0, %xmm1 > > vaddsd %xmm1, %xmm0, %xmm0 > > vxorpd %xmm1, %xmm1, %xmm1 <--- create 0.0 > > vaddsd %xmm1, %xmm0, %xmm0 <--- add it to the reduction result > > ``` > > > > Alternatively (and I'm not sure where it is specified), we could replace > > the default 0.0 argument with -0.0? > Confirmed by new tests. I think there's no such assumption for fadd/fmul instructions. We do have it for fmin/fmax. So I think we don't need to add nsz here. ================ Comment at: clang/lib/Headers/avx512fintrin.h:9304 + * For floating points type, we always assume the elements are reassociable even + * if -fast-math is off. + ---------------- spatel wrote: > Also mention that sign of zero is indeterminate. We might use the LangRef > text as a model for what to say here: > https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fadd-intrinsic Got it. Thanks! ================ Comment at: clang/lib/Headers/avx512fintrin.h:9352 static __inline__ double __DEFAULT_FN_ATTRS512 _mm512_reduce_add_pd(__m512d __W) { return __builtin_ia32_reduce_fadd_pd512(0.0, __W); } ---------------- spatel wrote: > Ah - this is where the +0.0 is specified. This should be -0.0. We could still > add 'nsz' flag to be safe. -0.0 can fix the problem. But we don't need to add 'nsz'. We can add it if we can find a corner case. Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D96231/new/ https://reviews.llvm.org/D96231 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits