[PATCH] D113107: Support of expression granularity for _Float16.

Phoebe Wang via Phabricator via cfe-commits Tue, 16 Nov 2021 19:37:42 -0800

pengfei added inline comments.


================
Comment at: clang/lib/CodeGen/CGExprScalar.cpp:1315
+  if ((SrcType->isHalfType() || iSFloat16Allowed) &&
+      !CGF.getContext().getLangOpts().NativeHalfType) {
     // Cast to FP using the intrinsic if the half type itself isn't supported.
----------------
zahiraam wrote:
> zahiraam wrote:
> > pengfei wrote:
> > > rjmccall wrote:
> > > > pengfei wrote:
> > > > > rjmccall wrote:
> > > > > > pengfei wrote:
> > > > > > > rjmccall wrote:
> > > > > > > > Okay, this condition is pretty ridiculous to be repeating in 
> > > > > > > > three different places across the compiler.  Especially since 
> > > > > > > > you're going to change it when you implement the new option, 
> > > > > > > > right?
> > > > > > > > 
> > > > > > > > Can we state this condition more generally?  I'm not sure why 
> > > > > > > > this is so narrowly restricted, and the variable name isn't 
> > > > > > > > telling me anything, since `_Float16` must by definition be 
> > > > > > > > "allowed" if we have an expression of `_Float16` type.
> > > > > > > > since _Float16 must by definition be "allowed" if we have an 
> > > > > > > > expression of _Float16 type.
> > > > > > > 
> > > > > > > _Float16 is allowed only on a few targets. 
> > > > > > > https://clang.llvm.org/docs/LanguageExtensions.html#half-precision-floating-point
> > > > > > > By the way, we should update for X86 since it's not limited to 
> > > > > > > avx512fp16 now.
> > > > > > > _Float16 is allowed only on a few targets.
> > > > > > 
> > > > > > Yes, I know that.  But if `SrcType->isFloat16Type()` is true, we 
> > > > > > must be on one of those targets, because the type doesn't otherwise 
> > > > > > exist.
> > > > > I see your point now. The problem here is we want to allow the 
> > > > > `_Float16` to be used more broadly. But the target doesn't really 
> > > > > support it sometime. Currently full arithmatic operations are 
> > > > > supported only on target with AVX512FP16.
> > > > > We should cast for those targets without AVX512FP16 while avoid to do 
> > > > > on AVX512FP16.
> > > > I agree that many targets don't natively support arithmetic on this 
> > > > format, but x86 is not the first one that does.  Unless I'm 
> > > > misunderstanding, we already track this property in Clang's TargetInfo 
> > > > as `hasLegalHalfType()`.  `+avx512fp16` presumably ought to set this.
> > > > 
> > > > I'm not sure what the interaction with the `NativeHalfType` LangOpt is 
> > > > supposed to be here.  My understanding is that that option is just 
> > > > supposed to affect `__fp16`, basically turning it into a proper 
> > > > arithmetic type, i.e. essentially `_Float16`.  Whatever effect you want 
> > > > to apply to `_Float16` should presumably happen even if that option not 
> > > > set.
> > > > 
> > > > More broadly, I don't think your approach in this patch is correct.  
> > > > The type of operations on `_Float16` should not change based on whether 
> > > > the target natively supports `_Float16`.  If we need to emulate those 
> > > > operations on targets that don't provide them natively, we should do 
> > > > that at a lower level than the type system.
> > > > 
> > > > The most appropriate place to do that is going to depend on the exact 
> > > > semantics we want.
> > > > 
> > > > If we want to preserve `half` semantics exactly regardless of target, 
> > > > we should have Clang's IR generation actually emit `half` operations.  
> > > > Targets that don't support those operations natively will have to lower 
> > > > at least some of those operations into compiler-rt calls, but that's 
> > > > not at all unprecedented.
> > > > 
> > > > If we're okay with playing loose for performance reasons, we can 
> > > > promote to `float` immediately around individual arithmetic operations. 
> > > >  IR generation is probably the most appropriate place to do that.  But 
> > > > I'm quite concerned about that making `_Float16` feel like an 
> > > > unpredictable/unportable type; it seems to me that software emulation 
> > > > is much better.
> > > > 
> > > > If you're proposing the latter, I think you need to raise that more 
> > > > widely than a code review; please make a post on llvm-dev.
> > > > we already track this property in Clang's TargetInfo as 
> > > > `hasLegalHalfType()`
> > > 
> > > That sounds a good approch. Thank you.
> > > 
> > > > The type of operations on `_Float16` should not change based on whether 
> > > > the target natively supports `_Float16`. If we need to emulate those 
> > > > operations on targets that don't provide them natively, we should do 
> > > > that at a lower level than the type system.
> > > 
> > > Unfortunately, we can't do it at low level. The reason is (I'm not expert 
> > > in frontend, just recalled from last disscussion with GCC folks) we have 
> > > to do expresssion emulation to respect C/C++ semantics. GCC has option 
> > > `-fexcess-precision=16` to match the same result with native 
> > > instructions, but the default is `-fexcess-precision=fast` according to 
> > > language semantics.
> > > 
> > > > The most appropriate place to do that is going to depend on the exact 
> > > > semantics we want...
> > > 
> > > Note, we are not simply doing emulation in the frontend. It's backend's 
> > > responsibility to emulate a single `half` operation. But it's frontend's 
> > > responsibility to choose whether to emit several `half` operations or 
> > > emit promote + several `float` operations + truncate. As described in the 
> > > title, this patch is doing for the latter.
> > > Okay, this condition is pretty ridiculous to be repeating in three 
> > > different places across the compiler.  Especially since you're going to 
> > > change it when you implement the new option, right?
> > > 
> > > Can we state this condition more generally?  I'm not sure why this is so 
> > > narrowly restricted, and the variable name isn't telling me anything, 
> > > since `_Float16` must by definition be "allowed" if we have an expression 
> > > of `_Float16` type.
> > 
> > I agree  :)  I wasn't aware of this flag. I am still in the process of 
> > figuring out how float16 work and what is exactly required for it.
> > Yes may be having 2 separate patches is the right way to go. One that turns 
> > Float16 on for x86 without the use of +avx512fp16 feature and another one 
> > that does the emulation ( half to float -> float arithmetic-> truncation to 
> > half). @pengfei ?
> > > we already track this property in Clang's TargetInfo as 
> > > `hasLegalHalfType()`
> > 
> > That sounds a good approch. Thank you.
> > 
> > > The type of operations on `_Float16` should not change based on whether 
> > > the target natively supports `_Float16`. If we need to emulate those 
> > > operations on targets that don't provide them natively, we should do that 
> > > at a lower level than the type system.
> > 
> > Unfortunately, we can't do it at low level. The reason is (I'm not expert 
> > in frontend, just recalled from last disscussion with GCC folks) we have to 
> > do expresssion emulation to respect C/C++ semantics. GCC has option 
> > `-fexcess-precision=16` to match the same result with native instructions, 
> > but the default is `-fexcess-precision=fast` according to language 
> > semantics.
> > 
> > > The most appropriate place to do that is going to depend on the exact 
> > > semantics we want...
> > 
> > Note, we are not simply doing emulation in the frontend. It's backend's 
> > responsibility to emulate a single `half` operation. But it's frontend's 
> > responsibility to choose whether to emit several `half` operations or emit 
> > promote + several `float` operations + truncate. As described in the title, 
> > this patch is doing for the latter.
> 
> The _Float16 type is supported for both C and C++, on x86 systems with SSE2 
> enabled. 
> From https://gcc.gnu.org/onlinedocs/gcc/Half-Precision.html:
> "On x86 targets with SSE2 enabled, **without -mavx512fp16, all operations 
> will be emulated by software emulation** and the float instructions. The 
> default behavior for FLT_EVAL_METHOD is to keep the intermediate result of 
> the operation as 32-bit precision. This may lead to inconsistent behavior 
> between software emulation and AVX512-FP16 instructions. Using 
> -fexcess-precision=16 will force round back after each operation.
> 
> Using -mavx512fp16 will generate AVX512-FP16 instructions instead of software 
> emulation. The default behavior of FLT_EVAL_METHOD is to round after each 
> operation. The same is true with -fexcess-precision=standard and 
> -mfpmath=sse. If there is no -mfpmath=sse, -fexcess-precision=standard alone 
> does the same thing as before, It is useful for code that does not have 
> _Float16 and runs on the x87 FPU."
> This is what we are trying to reach out with this patch.
> 
I planed to enable Float16 and emulation in one patch, i.e., D107082. It's 
almost backend work. The only thing I can't do in that patch is emulating in 
C/C++ expression granularity, which has to leverage FE facilities. That's all 
what I expected for this patch.
I'm ashamed I haven't finished it yet and leaving confusion here :(


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D113107/new/

https://reviews.llvm.org/D113107

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D113107: Support of expression granularity for _Float16.

Reply via email to