@safrooze Thanks for your usecase. I have implemented the first edition of an 
MKL-DNN supported version for slice OP.
In cases on format nChw16c, which is the most widely used format, MKL-DNN is 
proved to have the capability to boost the slice OP by a lot.
Additionally, we found that the larger the input size is, the bigger the 
improvement is in the case of nChw16c.
Please check the profile log down below.


### **slice w/o MKL-DNN**
**Name**|**Total Count**|**Time (ms)**|**Min Time (ms)**|**Max Time 
(ms)**|**Avg Time (ms)**
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:
Reorder|202|2.808|0|1.318|0.0139
**slice**|**202**|**1145.891**|**5.357**|**6.295**|**5.6727**
Convolution|202|518.247|2.423|5.015|2.5656
CopyCPU2CPU|4|4.495|0.02|2.228|1.1237
Concat|202|352.702|1.668|4.333|1.746
\_full|2|0.023|0.011|0.012|0.0115
\_random\_uniform|4|19.74|0.386|9.484|4.935
\_zeros|8|6.206|0.003|2.733|0.7757
DeleteVariable|408|102.104|0.003|0.349|0.2503
ResourceParallelRandomSetSeed|2|6.704|3.351|3.353|3.352

### **slice w/ MKL-DNN**
**Name**|**Total Count**|**Time (ms)**|**Min Time (ms)**|**Max Time 
(ms)**|**Avg Time (ms)**
:-----:|:-----:|:-----:|:-----:|:-----:|:-----:
Reorder|202|2.212|0|1.012|0.011
**slice**|**202**|**507.673**|**2.395**|**2.802**|**2.5132**
Convolution|202|520.934|2.372|4.951|2.5789
CopyCPU2CPU|4|5.424|0.023|2.689|1.356
Concat|202|332.056|1.601|2.755|1.6438
\_full|2|0.025|0.012|0.013|0.0125
\_random\_uniform|4|19.853|0.413|9.515|4.9633
\_zeros|8|8.877|0.004|4.09|1.1096
DeleteVariable|408|37.766|0.005|0.217|0.1833
ResourceParallelRandomSetSeed|2|7.638|3.818|3.82|3.819

[ Full content available at: 
https://github.com/apache/incubator-mxnet/issues/12303 ]
This message was relayed via gitbox.apache.org for [email protected]

Reply via email to