wpan11nv opened a new pull request #4736: [CodeGen][CUDA] Improve CUDA vectorizer URL: https://github.com/apache/incubator-tvm/pull/4736 - Fixes issues to enable fp16 vectorizer. Now correct packing and unpacking CUDA code will be emitted. Enabled more unit tests. - Do not emit code to read the first lane from an undef variable int _3; _3 = _3 & ~(0x000000ff << 0) | ... and emit the following code instead: _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0); Note that nvcc 10.2 is forgiving and emits the same code for both cases. A warning appears in test_codegen_cuda.py. Signed-off-by: Wei Pan <[email protected]> Thanks for contributing to TVM! Please refer to guideline https://docs.tvm.ai/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @ them in the pull request thread.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
