AndrewZhaoLuo commented on issue #8294: URL: https://github.com/apache/tvm/issues/8294#issuecomment-866176947
Hmm yeah the problem has to do with what i say. Specifically in `python/tvm/topi/cuda/conv2d_winograd.py` the winograd matrix `G` is cast to the output dtypes while the kernel isn't so there is a type mismatch. In general it seems reasonable to have implicit type promotion to higher bit floating point types. Furthermore, it might also be good to have most binary arithmetic ops to have output_dtypes. E.g. right now there isn't a good way to represent adding two fp16 numbers into a fp32 result. Later NVidia GPUs support this as a more primitive operations so maybe we should have a better representation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
