reminisce commented on issue #7759: How to use cpu kernel of an operator (which provides only cpu implementation) in gpu context? URL: https://github.com/apache/incubator-mxnet/issues/7759#issuecomment-327685471 How are the operator's forward and backward kernels implemented? By kernel, I mean the function block that does parallel computation such as [this](https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/control_flow_op.h#L43). In MXNet, most of the simple operators shares the same kernels between CPU and GPU. That's why there is a template argument called `xpu` in Forward and Backward function as [this](https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/control_flow_op.h#L154). If your cpu kernel also works for gpu and you defined Forward and Backward functions using `xpu` template argument, what you need to do is register the operator in a `.cu` file as [this](https://github.com/apache/incubator-mxnet/blob/master/src/operator/tensor/control_flow_op.cu#L29). Sometimes, complicated operators do not share the same kernels for cpu and gpu as they have different parallelization approaches and each kernel has to be optimized against its own device. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
With regards, Apache Git Services
