LewisCrawford wrote:

> They should be writing the vector maximumnum/minimumnum intrinsics

Even though `nvvm.fmax/fmin` are very close to `maximumnum/minimumnum`, their 
semantics slightly differ, since the LLVM intrinsics depend on the global FTZ 
settings, but the nvvm versions encode on a per-instruction level whether FTZ 
is desired. As you can see in NVTPXTargetTransformInfo, we do translate to the 
`llvm.maximumnum` whenever the FTZ semantics make this valid.

In addition to these cases like `fmin/fmax` with instruction-level FTZ 
semantics, other cases that it would be useful to have scalarizable vector 
forms of target-specific intrinsics would be:

- Instructions requiring instruction-level rounding modes (e.g. `nvvm_add_rn_f` 
etc.)
- Target-specific math function approximations (e.g. `nvvm_sin_approx_f` , 
`amdgcn_cos` etc.)
- Other target-specific instructions like `nvvm_fmin_ftz_xorsign_abs_f` , where 
the `xorsign_abs` pattern would require a chain of several generic  LLVM 
instructions instead of using a single target-specific intrinsic.

> Users should be using canonical, generic patterns which the backend can match 
> into target instructions.

Being able to write e.g. `nvvm_add_rz(<32 x float> %x, <32 x float> %y)` seems 
closer to this ideal of a generic pattern that can be matched into target 
instructions, rather than requiring users to emit 64x `ExtractElement`s, 32 
`nvvm_add_rz`s, and 32 `InsertElement`s, even if it uses a target-specific 
vector intrinsic, rather than a core LLVM one.

https://github.com/llvm/llvm-project/pull/194783
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to