[GitHub] [incubator-tvm] huochaitiantang opened a new pull request #5226: [CODEGEN][CUDA] Fix vector load

GitBox Fri, 03 Apr 2020 02:29:16 -0700

huochaitiantang opened a new pull request #5226: [CODEGEN][CUDA] Fix vector load
URL: https://github.com/apache/incubator-tvm/pull/5226
 
 
   * Fix high-low bit bug in __pack_half2.
   * Do not emit code of vector load by introducing an extra statement and 
vector store:
   ```
       int _1;
       int4 _2 = (make_int4)(
         ((((((int)blockIdx.x) * 64) + ((int)threadIdx.x)) - 3))+(16*0), 
         ((((((int)blockIdx.x) * 64) + ((int)threadIdx.x)) - 3))+(16*1), 
         ((((((int)blockIdx.x) * 64) + ((int)threadIdx.x)) - 3))+(16*2), 
         ((((((int)blockIdx.x) * 64) + ((int)threadIdx.x)) - 3))+(16*3));
       
       _1=(((signed char*)A)[_2.x] << 0);
       _1=_1 & ~(0x000000ff << 8) |(((signed char*)A)[_2.y] << 8);
       _1=_1 & ~(0x000000ff << 16) |(((signed char*)A)[_2.z] << 16);
       _1=_1 & ~(0x000000ff << 24) |(((signed char*)A)[_2.w] << 24);
       (( int*)(( signed char*)B + (((((int)blockIdx.x) * 88) + 
(((int)threadIdx.x) * 4)))))[0] = 
       (((((int)threadIdx.x) < 3) || (19 <= ((int)threadIdx.x))) ? (int)0 : _1);
   ```
   The above code is a padding kernel.  Whether `_2.x`, `_2.y`, `_2.z`, `_2.w` 
are the correct indexes of `A` or not, the introduced variable `_1` will be 
calculated. So emit the following code instead:
   ```
     int4 _1 = (make_int4)(
       ((((((int)blockIdx.x) * 64) + ((int)threadIdx.x)) - 3))+(16*0), 
       ((((((int)blockIdx.x) * 64) + ((int)threadIdx.x)) - 3))+(16*1), 
       ((((((int)blockIdx.x) * 64) + ((int)threadIdx.x)) - 3))+(16*2), 
       ((((((int)blockIdx.x) * 64) + ((int)threadIdx.x)) - 3))+(16*3));
   
     (( int*)(( signed char*)B + (((((int)blockIdx.x) * 88) + 
(((int)threadIdx.x) * 4)))))[0] = 
       (((((int)threadIdx.x) < 3) || (19 <= ((int)threadIdx.x))) 
       ? (int)0 
       : ((0x000000ff << 0) & (((signed char*)A)[_1.x] << 0))|
         ((0x000000ff << 8) & (((signed char*)A)[_1.y] << 8))|
         ((0x000000ff << 16) & (((signed char*)A)[_1.z] << 16))|
         ((0x000000ff << 24) & (((signed char*)A)[_1.w] << 24)));
   ```
   @vinx13, could you please help review? Thanks!


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

[GitHub] [incubator-tvm] huochaitiantang opened a new pull request #5226: [CODEGEN][CUDA] Fix vector load

Reply via email to