reminisce commented on issue #7759: How to use cpu kernel of an operator (which 
provides only cpu implementation) in gpu context?
URL: 
https://github.com/apache/incubator-mxnet/issues/7759#issuecomment-327685471
 
 
   How are the operator's forward and backward kernels implemented? By kernel, 
I mean the function block that does parallel computation such as 
[this](https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/control_flow_op.h#L43).
 In MXNet, most of the simple operators shares the same kernels between CPU and 
GPU. That's why there is a template argument called `xpu` in Forward and 
Backward function as 
[this](https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/control_flow_op.h#L154).
 If your cpu kernel also works for gpu and you defined Forward and Backward 
functions using `xpu` template argument, what you need to do is register the 
operator in a `.cu` file as 
[this](https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/control_flow_op.cu#L29).
   
   Sometimes, complicated operators do not share the same kernels for cpu and 
gpu as they have different parallelization approaches and each kernel has to be 
optimized against its own device.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to