@safrooze Thanks for your usecase. I have implemented the first edition of an MKL-DNN supported version for slice OP. In cases on format nChw16c, which is the most widely used format, MKL-DNN is proved to have the capability to boost the slice OP by a lot. Additionally, we found that the larger the input size is, the bigger the improvement is in the case of nChw16c. Please check the profile log down below.
### **slice w/o MKL-DNN** **Name**|**Total Count**|**Time (ms)**|**Min Time (ms)**|**Max Time (ms)**|**Avg Time (ms)** :-----:|:-----:|:-----:|:-----:|:-----:|:-----: Reorder|202|2.808|0|1.318|0.0139 **slice**|**202**|**1145.891**|**5.357**|**6.295**|**5.6727** Convolution|202|518.247|2.423|5.015|2.5656 CopyCPU2CPU|4|4.495|0.02|2.228|1.1237 Concat|202|352.702|1.668|4.333|1.746 \_full|2|0.023|0.011|0.012|0.0115 \_random\_uniform|4|19.74|0.386|9.484|4.935 \_zeros|8|6.206|0.003|2.733|0.7757 DeleteVariable|408|102.104|0.003|0.349|0.2503 ResourceParallelRandomSetSeed|2|6.704|3.351|3.353|3.352 ### **slice w/ MKL-DNN** **Name**|**Total Count**|**Time (ms)**|**Min Time (ms)**|**Max Time (ms)**|**Avg Time (ms)** :-----:|:-----:|:-----:|:-----:|:-----:|:-----: Reorder|202|2.212|0|1.012|0.011 **slice**|**202**|**507.673**|**2.395**|**2.802**|**2.5132** Convolution|202|520.934|2.372|4.951|2.5789 CopyCPU2CPU|4|5.424|0.023|2.689|1.356 Concat|202|332.056|1.601|2.755|1.6438 \_full|2|0.025|0.012|0.013|0.0125 \_random\_uniform|4|19.853|0.413|9.515|4.9633 \_zeros|8|8.877|0.004|4.09|1.1096 DeleteVariable|408|37.766|0.005|0.217|0.1833 ResourceParallelRandomSetSeed|2|7.638|3.818|3.82|3.819 [ Full content available at: https://github.com/apache/incubator-mxnet/issues/12303 ] This message was relayed via gitbox.apache.org for [email protected]
