DickJC123 commented on a change in pull request #20324:
URL: https://github.com/apache/incubator-mxnet/pull/20324#discussion_r827454218
##########
File path: src/operator/tensor/elemwise_unary_op_basic.cu
##########
@@ -115,7 +115,10 @@ void ShapeComputeGPU(const nnvm::NodeAttrs& attrs,
mshadow::Stream<gpu>::GetStream(s));
}
-NNVM_REGISTER_OP(shape_array).set_attr<FCompute>("FCompute<gpu>",
ShapeComputeGPU);
+NNVM_REGISTER_OP(shape_array)
Review comment:
I'm not sure whether I saw a failure, or whether I marked it
incompatible on general principles. But basically, this op does a cpu->gpu
copy:
https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/elemwise_unary_op_basic.cu#L111-L115
Now for CUDA Graphs to function, we rely on the fact that the tensor's gpu
dptr will remain unchanged, but what about the TBlob instance?
ShapeComputeGPU() is receiving a reference to the inputs and is not copying the
TBlob, but can the op rely on the fact that the TBlob is not copied at some
higher level of graph handling? A copy of the input TBlob instance would
preserve the dptr, but would change the cpu-memory pointer value returned by
in_data.shape_.data().
So bottom line, I think this might work to have the Op declared 'CUDA Graphs
compatible', but it relies on a different 'contract' with the rest of the
software than the one pertaining to tensor layouts in gpu memory.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]