DickJC123 commented on a change in pull request #20324:
URL: https://github.com/apache/incubator-mxnet/pull/20324#discussion_r827454218



##########
File path: src/operator/tensor/elemwise_unary_op_basic.cu
##########
@@ -115,7 +115,10 @@ void ShapeComputeGPU(const nnvm::NodeAttrs& attrs,
                   mshadow::Stream<gpu>::GetStream(s));
 }
 
-NNVM_REGISTER_OP(shape_array).set_attr<FCompute>("FCompute<gpu>", 
ShapeComputeGPU);
+NNVM_REGISTER_OP(shape_array)

Review comment:
       I'm not sure whether I saw a failure, or whether I marked it 
incompatible on general principles.  But basically, this op does a cpu->gpu 
copy: 
https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/elemwise_unary_op_basic.cu#L111-L115
   
   Now for CUDA Graphs to function, we rely on the fact that the tensor's gpu 
dptr will remain unchanged, but what about  the TBlob instance?  
ShapeComputeGPU() is receiving a reference to the inputs and is not copying the 
TBlob, but can the op rely on the fact that the TBlob is not copied at some 
higher level of graph handling?  A copy of the input TBlob instance would 
preserve the dptr, but would change the cpu-memory pointer value returned by 
in_data.shape_.data().
   
   So bottom line, I think this might work to have the Op declared 'CUDA Graphs 
compatible', but it relies on a different 'contract' with the rest of the 
software than the one pertaining to tensor layouts in gpu memory.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to