ptrendx commented on a change in pull request #16979: [Bugfix] [Numpy] Add
`kAddTo` and kNullOp to Transpose
URL: https://github.com/apache/incubator-mxnet/pull/16979#discussion_r355707590
##########
File path: src/operator/tensor/pseudo2DTranspose_op-inl.cuh
##########
@@ -78,23 +85,34 @@ __global__ void transpose_pseudo2D(DType* out, DType* inp,
}
__syncthreads();
- // read from shared to registers
- transp_t tmp[TSR];
+ // read from shared to local registers
+ CType tmp[TSR];
#pragma unroll
for (index_t i = 0; i < TSR; i++) {
+ DType* tmp_dptr = reinterpret_cast<DType*>(&tmp[i]);
Review comment:
Hi, I only now had a chance to look into it. No, there is not any problem -
I was worried that the compiler could get confused by this and put the `tmp`
array in local memory instead of registers, but I tested and it does not do it.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services