masahi commented on issue #8296: URL: https://github.com/apache/tvm/issues/8296#issuecomment-926605871
@AndrewZhaoLuo I briefly looked at bfloat16. While fp16 vs bf16 makes no difference for the conversion pass, it seems it is going to take a lot of effort to compile and run a bf16 model end to end, for at least two reasons: * The constant folding pass doesn't work on bfloat16 input * Numpy doesn't understand bfloat16, but some topi schedules (winograd conv) try to create a numpy array of type `out_dype`, which in this case bfloat16. Since tensorcore can natively run bf16 workloads at the same rate as fp16, and bf16 on x86 servers are becoming a thing, it would be nice to have a good support for bf16 across the stack in the future. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
