wpan11nv opened a new pull request #4736: [CodeGen][CUDA] Improve CUDA 
vectorizer
URL: https://github.com/apache/incubator-tvm/pull/4736
 
 
   - Fixes issues to enable fp16 vectorizer. Now correct packing and
     unpacking CUDA code will be emitted. Enabled more unit tests.
   
   - Do not emit code to read the first lane from an undef variable
   
     int _3;
     _3 = _3 & ~(0x000000ff << 0) | ...
   
     and emit the following code instead:
   
     _3 = (((0x000000ff & (_1 >> 0))+(0x000000ff & (_2 >> 0))) << 0);
   
     Note that nvcc 10.2 is forgiving and emits the same code for both cases.
     A warning appears in test_codegen_cuda.py.
   
   Signed-off-by: Wei Pan <[email protected]>
   
   Thanks for contributing to TVM!   Please refer to guideline 
https://docs.tvm.ai/contribute/ for useful information and tips. After the pull 
request is submitted, please request code reviews from 
[Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers)
 by @ them in the pull request thread.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to