anemet added inline comments.

================
Comment at: cfe/trunk/include/clang/Basic/LangOptions.def:220
+/// \brief FP_CONTRACT mode (on/off/fast).
+ENUM_LANGOPT(DefaultFPContractMode, FPContractModeKind, 2, FPC_Off, "FP 
contraction type")
 LANGOPT(NoBitFieldTypeAlign , 1, 0, "bit-field type alignment")
----------------
hfinkel wrote:
> anemet wrote:
> > hfinkel wrote:
> > > yaxunl wrote:
> > > > hfinkel wrote:
> > > > > yaxunl wrote:
> > > > > > anemet wrote:
> > > > > > > yaxunl wrote:
> > > > > > > > anemet wrote:
> > > > > > > > > yaxunl wrote:
> > > > > > > > > > This change seemed to cause some performance degradations 
> > > > > > > > > > in some OpenCL applications.
> > > > > > > > > > 
> > > > > > > > > > This option used to be on by default. Now it is off by 
> > > > > > > > > > default.
> > > > > > > > > > 
> > > > > > > > > > There are applications do separated compile/link/codegen 
> > > > > > > > > > stages. In the codegen stage, clang is invoked with .bc as 
> > > > > > > > > > input, therefore this option is off by default, whereas it 
> > > > > > > > > > was on by default before this change.
> > > > > > > > > > 
> > > > > > > > > > Is there any reason not to keep the original behavior?
> > > > > > > > > > 
> > > > > > > > > > Thanks.
> > > > > > > > > > This change seemed to cause some performance degradations 
> > > > > > > > > > in some OpenCL applications.
> > > > > > > > > > 
> > > > > > > > > > This option used to be on by default. Now it is off by 
> > > > > > > > > > default.
> > > > > > > > > 
> > > > > > > > > It's always been off.  It was set to fast for CUDA which 
> > > > > > > > > should still be the case.  See the changes further down on 
> > > > > > > > > the patch.
> > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > There are applications do separated compile/link/codegen 
> > > > > > > > > > stages. In the codegen stage, clang is invoked with .bc as 
> > > > > > > > > > input, therefore this option is off by default, whereas it 
> > > > > > > > > > was on by default before this change.
> > > > > > > > > > 
> > > > > > > > > > Is there any reason not to keep the original behavior?
> > > > > > > > > 
> > > > > > > > > Sorry but I am not sure what changed, can you elaborate on 
> > > > > > > > > what you're doing and how things used to work for you?
> > > > > > > > Before your change, -ffp-contract was a codegen option defined 
> > > > > > > > as
> > > > > > > > 
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > ENUM_CODEGENOPT(FPContractMode, FPContractModeKind, 2, FPC_On)
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > therefore the default value as on. After your change, it 
> > > > > > > > becomes off by default.
> > > > > > > No. -ffp-contract=on was a FE option and -ffp-contract=fast was a 
> > > > > > > backend option.  I still don't understand your use-case.  Can you 
> > > > > > > include a small testcase how this used to work before?
> > > > > > -ffp-contract=on used to be a codegen option before this change. In 
> > > > > > CodeGen/BackendUtil.cpp, before this change:
> > > > > > 
> > > > > > ```
> > > > > > switch (CodeGenOpts.getFPContractMode()) {
> > > > > >   switch (LangOpts.getDefaultFPContractMode()) {
> > > > > >   case LangOptions::FPC_Off:
> > > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Strict;
> > > > > >     break;
> > > > > >   case LangOptions::FPC_On:
> > > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
> > > > > >     break;
> > > > > >   case LangOptions::FPC_Fast:
> > > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
> > > > > >     break;
> > > > > >   }
> > > > > > ```
> > > > > > By default, -fp-contract=on, which results in 
> > > > > > llvm::FPOpFusion::Standard in the backend. This options allows 
> > > > > > backend to translate llvm.fmuladd to FMA machine instructions.
> > > > > > 
> > > > > > After this change, it becomes:
> > > > > > 
> > > > > > ```
> > > > > > switch (LangOpts.getDefaultFPContractMode()) {
> > > > > >   case LangOptions::FPC_Off:
> > > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Strict;
> > > > > >     break;
> > > > > >   case LangOptions::FPC_On:
> > > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Standard;
> > > > > >     break;
> > > > > >   case LangOptions::FPC_Fast:
> > > > > >     Options.AllowFPOpFusion = llvm::FPOpFusion::Fast;
> > > > > >     break;
> > > > > >   }
> > > > > > ```
> > > > > > now by default -ffp-contract=off, which results in  
> > > > > > llvm::FPOpFusion::Strict in the backend. This option disables 
> > > > > > backend translating llvm.fmuladd to FMA machine instructions in 
> > > > > > certain situations.
> > > > > > 
> > > > > > A simple lit test is as follows:
> > > > > > 
> > > > > > 
> > > > > > ```
> > > > > > ; RUN: %clang_cc1 -triple amdgcn -S -o - %s| FileCheck %s
> > > > > > 
> > > > > > define amdgpu_kernel void @f(double addrspace(1)* nocapture %out, 
> > > > > > double %a, double %b, double %c) local_unnamed_addr #0 {
> > > > > > entry:
> > > > > >   ; CHECK: v_fma_f64
> > > > > >   %0 = tail call double @llvm.fmuladd.f64(double %b, double %c, 
> > > > > > double %a)
> > > > > >   store double %0, double addrspace(1)* %out, align 8, !tbaa !6
> > > > > >   ret void
> > > > > > }
> > > > > > 
> > > > > > declare double @llvm.fmuladd.f64(double, double, double) #1
> > > > > > 
> > > > > > attributes #0 = { nounwind 
> > > > > > "correctly-rounded-divide-sqrt-fp-math"="false" 
> > > > > > "disable-tail-calls"="false" "less-precise-fpmad"="false" 
> > > > > > "no-frame-pointer-elim"="false" "no-infs-fp-math"="false" 
> > > > > > "no-jump-tables"="false" "no-nans-fp-math"="false" 
> > > > > > "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" 
> > > > > > "stack-protector-buffer-size"="8" 
> > > > > > "target-features"="+fp64-fp16-denormals,-fp32-denormals" 
> > > > > > "unsafe-fp-math"="false" "use-soft-float"="false" }
> > > > > > attributes #1 = { nounwind readnone }
> > > > > > 
> > > > > > ```
> > > > > > The test passes before this change and fails after this change.
> > > > > I'm missing something here. We're calling for OpenCL:
> > > > > 
> > > > >   Opts.setDefaultFPContractMode(LangOptions::FPC_On);
> > > > > 
> > > > > which seems like it should change the result returned by 
> > > > > getDefaultFPContractMode(). Why does it not?
> > > > > 
> > > > The input to this test is LLVM IR, so 
> > > > `Opts.setDefaultFPContractMode(LangOptions::FPC_On)` is not executed.
> > > Seems like this is another aspect of the LTO problem. All of these things 
> > > need to be function attributes. This is also true to make LTO work.
> > > 
> > @hfinkel, what is the actual point of un-fusing the intrinsic in the BE 
> > with FPOpFusion::Strict?
> > 
> > I don't know if targets use the target-independent intrinsic to implement 
> > builtin support but if yes this would override the user choosing of FMA by 
> > directly specifying the builtin.
> > @hfinkel, what is the actual point of un-fusing the intrinsic in the BE 
> > with FPOpFusion::Strict?
> 
> I think this was added based on a different conception of how frontends would 
> use the intrinsic: that they'd always add it where the language-semantics 
> allow, and then the backend would either never use FMAs (strict), use FMAs 
> where profitable (standard), etc.
> 
> > I don't know if targets use the target-independent intrinsic to implement 
> > builtin support but if yes this would override the user choosing of FMA by 
> > directly specifying the builtin.
> 
> No target should use fmuladd to implement the builtin support for a 
> target-specific intrinsic. It should use the fma intrinsic would always maps 
> to an fma.
> I think this was added based on a different conception of how frontends would 
> use the intrinsic: that they'd always add it where the language-semantics 
> allow, and then the backend would either never use FMAs (strict), use FMAs 
> where profitable (standard), etc.

So would you support disabling un-fusion on Strict?  That should solve Yaxun's 
problem.

Also my mid-term goal is to completely remove the FPOpFusion target option in 
favor of the explicit representation in the IR: either the FMF or the direct 
use of the intrinsic. At that point the behavior would have to change anyway.

> No target should use fmuladd to implement the builtin support for a 
> target-specific intrinsic. It should use the fma intrinsic would always maps 
> to an fma.

Could you please rephrase this, I am not sure I understand?



Repository:
  rL LLVM

https://reviews.llvm.org/D31167



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to