bgawrych opened a new pull request #20163:
URL: https://github.com/apache/incubator-mxnet/pull/20163


   ## Description ##
   This change adds oneDNN support for two operators:
   - _contrib_interleaved_matmul_selfatt_qk
   - _contrib_interleaved_matmul_selfatt_valatt
   
   Both operators will be used when backend MKLDNN/MKLDNN_QUANTIZE will be 
chosen - there is no change in terms of performance between MKL fp32 vs. oneDNN 
fp32, but the main advantage is utilizing int8 data type
   
   10 iterations of BERT-Large (gluon-nlp v0.10.x) [Intel(R) Core(TM) i9-9940X 
CPU @ 3.30GHz]:
   
   **MKL implementation (fp32 as int8 is not supported):**
   
![image](https://user-images.githubusercontent.com/59644968/114681509-84ea9080-9d0e-11eb-8c92-5d91827907b9.png)
   
   **oneDNN implementation (int8):**
   
![image](https://user-images.githubusercontent.com/59644968/114681535-8a47db00-9d0e-11eb-9a1d-868bf9c1b890.png)
   
   _contrib_interleaved_matmul_selfatt_qk => _sg_mkldnn_selfatt_qk
   _contrib_interleaved_matmul_selfatt_qk => _sg_mkldnn_selfatt_valatt
   
   We can observe that this change positively influenced other operators as 
there is less dequantization/quantization overhead and memory reorders
   
   Great contribution of @grygielski to this change
   
   ## Checklist ##
   ### Essentials ###
   - [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], 
[FEATURE], [DOC], etc)
   - [x] Changes are complete (i.e. I finished coding on this PR)
   - [x] All changes have test coverage
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to