Menooker opened a new pull request #5601:
URL: https://github.com/apache/incubator-tvm/pull/5601
We add bfloat16 as a new type named "bf16" in the frontend. Completed LLVM
backend for generating bf16.
* Use int16 as the storage type in LLVM
* Add legalization to enable computations on bf16
* Add runtime frontend support (e.g. allow converting numpy's uint16 array
to bf16 NDArray)
# Details on legalization
Since most of the HW has no native support for computation on bf16, we added
a pass `BF16Legalization` to use fp32 computing bf16 data. It adds
`cast_to_fp32()` before each Op involing bf16 operands, and use Ops of fp32 to
compute. Finally, it adds a 'cast_to_bf16()' after each Op that is altered. e.g.
`add(a,b)` => `cast16(add(cast32(a), cast32(b)))`
We call this phase as "BF16Promotion". It is a sub-pass of
`BF16Legalization` pass.
We note that this will add redundant casting. e.g.
`add(a, neg(b))` => `cast16(add(cast32(a), cast32(cast16(neg(cast32(b))))) `
The pattern `cast32(cast16(some_fp32_value))` can be simplified to
`some_fp32_value`.
Thus, we add an optimization pass after "BF16Promotion" in
`BF16Legalization` pass, which eliminates redundant casts.
After `BF16Legalization` pass, there will be no bf16 related computation in
the AST, except casting between fp32 and bf16, bf16 value comparasion and
assignment.
# Casting between fp32 and bf16
We follow PyTorch's bf16 [casting]
(https://github.com/pytorch/pytorch/blob/master/c10/util/BFloat16.h)
implementation.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]