Hi developers, As you may know, currently there is no fused RNN operator for CPU in mxnet and that prevents users from migrating or deploying their models on CPU, if their models are built with mxnet fused RNN cell APIs. This feature disparity also makes it hard to maintain mxnet code and develop unit tests for this feature.
We are trying to fill this gap with self-implemented RNN operators or with MKL-DNN primitives. Currently, PR #10104<https://github.com/apache/incubator-mxnet/pull/10104> and #10311<https://github.com/apache/incubator-mxnet/pull/10311> are submitted for fused LSTM and GRU and ready for review. Both inference and training are implemented for these two RNN variants. We can get >2x performance improvement compared with LSTMCell and GRUCell. Recently, Intel released the experimental feature of RNN primitives in MKL-DNN. We are also planning to integrate MKL-DNN RNN primitives into mxnet. A proposal is drafted to describe what we have done and what we are planning to do in the near future. Please kindly find it with below links and feel free to give any comments: Mxnet wiki: https://cwiki.apache.org/confluence/display/MXNET/Fused+RNN+Operators+for+CPU Google doc: https://docs.google.com/document/d/1XC_PmbSc7q6px22LIW3vwhbA_wmX8wRGLRnet3pMJrs/edit?usp=sharing BTW, we are trying to enable more RNN related models to verify the performance improvement and accuracy in real workloads. It would be very appreciated if anyone can provide open-sourced models which are using the fused RNN operators of mxnet. Seems sockeye from awslab and DS2 from mxnet example folder are not using that. Thanks in advance. *we: intel team, cced ------------------------------- Best Regards, LvTao