ptrendx commented on a change in pull request #16979: [Bugfix] [Numpy] Add
`kAddTo` and kNullOp to Transpose
URL: https://github.com/apache/incubator-mxnet/pull/16979#discussion_r354438866
##########
File path: src/operator/tensor/pseudo2DTranspose_op-inl.cuh
##########
@@ -39,22 +39,29 @@ namespace mxnet {
namespace op {
namespace cuda {
-
-template <typename DType, typename CType>
+/*!
+ * \brief The `transpose_pseudo2D` based on chosen vectorized types. It
transpose an array of
+ * shape (k, m, n) to (k, n, m)
+ * \param out Pointer to output memory.
+ * \param inp Pointer to input memory.
+ * \param m First of tensor dimensions.
+ * \param n Second of tensor dimensions.
+ * \param nIterY The number of iterations in the y-dim of the thread to cover
all rows. (1-->m)
+ * \param nIterZ The number of iterations in the z-dim of the thread to cover
all rows. (1-->m)
+ * \tparam DType Data type
+ * \tparam CType The type to load the data.
+ * \tparam TSR the vectorized ratio.
+ * \tparam is_addto Whether to perform out += transpose(data) or out =
transpose(data)
+ */
+template <typename DType, typename CType, int TSR, bool is_addto>
__global__ void transpose_pseudo2D(DType* out, DType* inp,
const index_t m, const index_t n,
const index_t nIterY, const index_t nIterZ)
{
- const index_t TSR = sizeof(CType)/sizeof(DType); // TypeSizeRatio
Review comment:
Why did you move this to template argument? It should be fine here (you can
even make it `constexpr`).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services