bgawrych opened a new pull request #20163: URL: https://github.com/apache/incubator-mxnet/pull/20163
## Description ## This change adds oneDNN support for two operators: - _contrib_interleaved_matmul_selfatt_qk - _contrib_interleaved_matmul_selfatt_valatt Both operators will be used when backend MKLDNN/MKLDNN_QUANTIZE will be chosen - there is no change in terms of performance between MKL fp32 vs. oneDNN fp32, but the main advantage is utilizing int8 data type 10 iterations of BERT-Large (gluon-nlp v0.10.x) [Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz]: **MKL implementation (fp32 as int8 is not supported):**  **oneDNN implementation (int8):**  _contrib_interleaved_matmul_selfatt_qk => _sg_mkldnn_selfatt_qk _contrib_interleaved_matmul_selfatt_qk => _sg_mkldnn_selfatt_valatt We can observe that this change positively influenced other operators as there is less dequantization/quantization overhead and memory reorders Great contribution of @grygielski to this change ## Checklist ## ### Essentials ### - [x] PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc) - [x] Changes are complete (i.e. I finished coding on this PR) - [x] All changes have test coverage -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
